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BRCA1 ASSOCIATED PROTEIN (BAP-1) AND USES THEREFOR 

This invention was made under work supported by 
National Institutes of Health, Grant Nos. CA52009, 
DK49210, and TM54220. The United States Government has 
5 certain rights in this invention. 

FAqlti Qf tfre InvenUoft 

This invention relates generally to the field of 
genes associated with cancers, and particularly, to 
BRCA1 . 



10 Backgrou nd of the Invention 

The breast and ovarian cancer susceptibility gene, 
BRCA1, is linked to the hereditary form of breast cancer. 
The BRCA1 gene is located on chromosome 17 at the locus 
17q21 and encodes a protein of 1863 amino acids. The 

15 BRCA 1 locus spans >100 kb comprising 24 exons [Miki et 
al, Science . 266 :66-71 (1994)]. Expression of wild-type 
BRCA 1 inhibits colony function and tumor growth in vivo, 
whereas tumor derived mutations of BRCA 1 abolish this 
growth suppression [Holt et al, Nature Genetics . 12 : 298- 

20 302 (1996)]. Germline mutations in BRCA 1 appear to 
account for 50% of familial breast cancers and 
essentially all families with 17q21-linked inherited 
susceptibility to ovarian and breast cancer [Szabo et al, 
Hum. Mol. Genet. . 4: 1811-1817 (1995) ; Hall et al, 

25 Science , 250 :1684-1689 (1990); Easton et al, Am J. Hu. 
Genet. . 56:265-271 (1995); Narod et al, Am, J. Hu. 
Genet. , 56:254-264 (1995)]. Kindreds segregating 
constitutive BRCA1 mutations show a lifetime risk of 
4 0-50% for ovarian cancer and >80% for breast cancer 

30 [Easton et al, Am. J. Hum. Genet. , 52:678-701 (1993); 
Easton et al, Am. J. Hum. Genet. , 56:265-271 (1995)]. 

The classification of BRCA 1 as a highly penetrant, 
autosomal dominant tumor suppressor gene has been 
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genetically confirmed by the finding of frequent loss or 
mutation (LOH) of the wild-type allele in breast tumors 
from mutation carriers [Hall et al, Science . 250 :1684- 
1689 (1990); Miki et al, cited above; Smith et al, Nature 
5 Genetics , 2:128-131 (1992)]. Surprisingly, BRCA 1 

mutations in sporadic breast cancer including those which 
show LOH have yet to be found and are extremely rare in 
sporadic ovarian cancer [Futreal et al, Science , 266 : 120- 
122 (1994); Merajver et al. Nature Genetics . 9:439-443 

10 (1995)]. 

Although the BRCAl protein resembles no known 
protein, it does contain a RING domain at its amino 
terminus [Miki, cited above; Bienstock et al, Cancer 
Res. . 56:2539*2545 (1996)]. The RING finger domain is a 

15 complicated structure which chelates two zinc atoms using 
7 Cys residues and 1 His residue [C 3 HC 4 ; Levering et al, 
Proc. Natl. Acad. Sci. USA . 9022112-2116 (1993); Freemont 
et al, Ann. NY Acad. Sci. . ££1:174-192 (1993) ] . This 
domain is present in a wide variety of proteins with 

20 various functions, but the function of the RING finger 

domain within these proteins is unknown [for a review see 
Saurin, Trends in Biochem. Sci.. 21:208-214 (1996)]. The 
RING finger of BRCAl is important to its function since 
missense mutations in the RING domain (Cys61Gly and 

25 Cys64Gly) are found in breast/ ovarian kindreds [Friedman 
et al, Nat . Genet . . 8:399-404 (1994); Merajver, cited 
above; Castilla et al, Nature Genet. . 8:387-391 (1994)]. 
In addition, the RING finger domain is the most conserved 
region of BRCAl, when comparing the human, mouse and rat 

30 proteins. The BRCAl RING finger is anticipated to be a 
binding site for protein (s) which either mediate BRCAl 
tumor suppressor function or serve to regulate these 
functions. Genetic evidence supports this in that single 
amino-acid substitutions at metal chelating cysteines, 

35 C61G and C64G, occur in kindreds; these mutations 
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segregate with the disease susceptibility phenotype and 
are predicted to abolish RING finger structure. 

Other functions of BRCA1 are discussed in the 
following references which are incorporated herein by 
5 reference: Borden et al, EMBO J . , 11:1532-1541 (1995); 
Lovering et al, Proc. Natl, Acad, Sci. USA , £0:2112-2116 

(1993) ; Koonin et al. Nature Genet. . 11:266-268 (1996); 
Chen et al, Science , 270 :789-791 (1995) ; Chen et al, 
Cancer Research . 56:3168-3172 (1996); Scully et al, 

10 Science . 272:123-126 (1996); Thakur et al, Molecular & 

Cellular Biology , 12:444-452 (1997); Scully et al, Cell . 
88=265-275 (1997); Chapman et al, Nature , 382:678-679 
(1996); Scully et al, Proc. Natl. Acad, Sci. USA . 
£455605-5610 (1997); Marquis et al, Nature Genetics . 

15 11:17-26 (1995); Gudas et al, Cancer Res . . 55:4561-4565 
(1995); Gudas et al, Cell Growth and Differentiation . 
7:717-723 (1996); Vaughn et al, Cell Growth and 
Differentiation . 7:711-715 (1996); Marks et al, Oncogene P 
14:115-121 (1997); Zabludoff et al, Oncogene . 13:649-653 

20 (1996); Hakem et al, Cell , 85:1009-1023 (1996); Liu et 

al. Genes & Development , 10:1835-1843 (1996); Rao et al, 
Oncogene . 12:523-528 (1996); Thompson et al, Nature 
Genetics, 9:444-450 (1995); Chen et al, J. Biol. Chem. , 
271 :32863-32868 (1996); Wu et al, Nature Genetics , 

25 14:430-440 (1996); Klug et al, FASEB Journal , 9:597-604 

(1995) ; Saurin et al, Trends in Biochem. Sci. , 21:208-214 

(1996) ; Friedman et al, Genes & Development , 10:2067-2078 
(1996); Neuhausen & Marshall, Cancer Res. , 54:6069-6072 

(1994) ; Schildkraut et al, Am. J. Obstet. Gynecol. . 

30 172:908-913 (1995); FitzGerald et al, N. Engl. J. Med. , 
334:143-149 (1996); Ford et al, Lancet , 343:692-695 
(1994); Muto et al, Cancer Research . 5J>: 1250-1252 (1996)? 
Rao et al, Nature Genetics . 14:185-187 (1996), Struewing 
et al, Nature Genetics . 11:198-200 (1995); Couch et al, 

35 Human Mutation . 8:8-18 (1996); Holt et al, Nature 
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Genetics , 12.2298-302 (1996); Jensen et al, Nature 
Genetics . 12:303-308 (1996); Bradley & Sharan, Nature 
Genetics. 11:268-271 (1996). 

There is a need in the art for compositions and 
5 methods useful in the treatment and/or prophylaxis of 
cancers caused by loss of, and mutations in, BRCA1. 

Summary of the Invention 

The present invention meets the needs in the art by 
identifying a novel mammalian BRCA1 Associated Protein 

10 (BAP-1) and nucleic acid sequences encoding same. BAP1 
is the first nuclear-localized ubiquitin carboxy-terminal 
hydrolase to be identified and is a new tumor suppressor 
gene which functions in the BRCA1 growth control pathway. 
Compositions, both diagnostic and therapeutic, based on 

15 this newly identified protein are provided herein. 

Thus, in one aspect, the present invention provides 
a nucleic acid sequence, which is isolated from cellular 
materials with which it is naturally associated. The 
nucleic acid sequence is preferably selected from SEQ ID 

20 NO:l, or a fragment thereof. Such a fragment may have a 
specified biological function as discussed below, or may 
encode a peptide having a similar biological function as 
the intact BAP-1. Homologous nucleotide sequences, and 
modified nucleotide sequences which encode peptides or 

25 proteins which have a similar biological function as the 
intact BAP-1, are also included in this aspect of the 
invention. 

In another aspect, the present invention provides a 
mammalian BRCA1 associated protein (BAP-1) . In one 
30 preferred embodiment, the protein is human and has the 
amino acid sequence of SEQ ID NO: 2. In another 
embodiment a fragment of the SEQ ID NO: 2 encodes a 
peptide having a similar biological function as the 
intact BAP-1 protein. Amino acid sequences homologous to 
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SEQ ID NO: 2, and modified amino acid sequences of SEQ ID 
NO: 2, which encode peptides or proteins which have a 
similar biological function as the intact BAP-1 or a 
specified biological function as discussed below, are 
5 also included in this aspect of the invention. 

In yet another aspect, the present invention 
provides a polynucleotide molecule, for example, a vector 
or plasmid, that comprises a mammalian BAP-1 nucleic acid 
sequence as defined herein under the control of suitable 

10 sequences which direct and regulate expression of the 
BAP-1 nucleic acid sequence. 

In a further aspect, the present invention provides 
a host cell transformed with a polynucleotide molecule or 
vector of the invention. 

15 In yet a further aspect, the present invention 

provides a method of recombinantly expressing BAP-1 or a 
peptide fragment thereof, by culturing a recombinant host 
cell according to the invention under conditions which 
permit expression of BAP-1 or a fragment thereof. 

20 In still a further aspect, the present invention 

provides an anti-BRCAl associated protein (BAP-1) 
antibody. 

In yet another aspect, the invention provides a 
diagnostic reagent comprising an antibody of the 
25 invention and a detectable label. Alternatively, a 
diagnostic reagent of the invention may comprise a 
nucleic acid sequence of the invention, or a fragment 
thereof, and a detectable label which is associated with 
said sequence. 

30 In still another aspect, the invention provides a 

method of detecting a cancer associated with abnormal 
levels of BAP-1 comprising providing a biopsy sample from 
a patient suspected of having said cancer and incubating 
said sample in the presence of a diagnostic reagent of 

35 the invention. 
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In a further aspect, the present invention provides 
methods of identifying compounds which specifically bind 
to BAP-1 or a fragment thereof. In still a further 
aspect, the present invention provides for compounds or 
drugs produced by use of the above methods. 

Other aspects and advantages of the present 
invention are described further in the following detailed 
description of the preferred embodiments thereof. 

Brief Description of the Drawings 

FIG. 1A illustrates the structural features of the 
BRCA1 gene product. It shows, an alignment of RING finger 
domains of human BRCA1 [SEQ ID NO: 3] and mouse BRCA1 [SEQ 
ID N0:4] (AA1-100), RPT-1 (amino acids 12-100 of SEQ ID 
NO: 5), a putative lymphocyte specific transcription 
factor having the most closely related RING finger, and 
BARD1 (AA 47-89) [SEQ ID NO: 19]. Asterisks (*) identify 
the Zn-chelating amino acids that form the core of the 
RING finger. Boxed amino acids show regions of identity 
between the RING finger domains of human BRCA1 and the 
other proteins. Alignment was performed by ClustalW 
[Thompson et al, Nucleic Acids Research . 22:4673-4680 
(1994) ] . 

FIG. IB is a schematic map which illustrates the 
constructs made when the amino terminal 100 amino acids 
of human BRCA1 (which includes the RING finger domain) 
and the indicated amino acids of the various BRCA1-RF 
mutants and controls (described in Example 1) were fused 
to the LexA DNA-binding domain. The signature C3HC4 
structure is highlighted. 

FIG. 2 A provides a comparison of the amino terminal 
regions of BAP-1 (FLBAP) [amino acids 1-257 of SEQ ID NO: 
2], a C. elegans 3 protein [SEQ ID NO: 16], and human 
ubiquitin carboxyl-terminal hydrolase isozymes LI (human 
UBL1) [SEQ ID NO: 17] and L3 (human UBL3) [SEQ ID NO: 
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18]. Boxed regions indicate areas of greater than 85% 
homology. This region contains the active sites of UBL1 
and UBL3 • 

FIG. 2B illustrates the sequences and provides a 
5 comparison of the partial human [SEQ ID NO: 6] and mouse 
BAP-1 proteins [SEQ ID NO: 7-9] isolated via the yeast 
2-hybrid screens of Example 2. Capital letters encode 
BAP-1. Lower case letters represent the amino acids 
encoded by the vector. Human BAP-1 is fused to Gal4 

10 activation domain. Mouse BAP-1 is fused to the VP16 
activation domain. 

FIGS. 3A-3E provide the nucleic acid [SEQ ID N0:1] 
and amino acid [SEQ ID NO: 2] sequences of the novel 
ubiquitin carboxy-terminal hydrolase, BAP-1. The longest 

15 open reading frame which contained the amino acids 

defined by the (human) 2-hybrid fusion protein is 2188 
nucleotides encoding 729 amino acids. The cDNA also 
contains 39 nucleotides of 5 *UTR and 1705 nucleotides of 
S'UTR. The enzymatic active site is contained within the 

20 first 250 amino acids; the active site residues are 

circled. The putative nuclear localization signals (NLS) 
are underlined, the highly acidic region is boxed with 
heavy lines, the interaction domain is boxed and the 
protein fragment used to generate BAP1 polyclonal 

25 antibodies is bracketed (A.A. f s 483-576 of SEQ ID NO: 2). 
The conserved amino acids of the ubiquitin COOH-terminal 
hydrolase active site consensus are circled (amino acids 
91, 169, 184 Of SEQ ID NO: 2). 

FIG. 3F is a comparison of BAP1 (amino acids 1-261 

30 of SEQ ID NO: 2) with other UCH's. UCH-CAEEL (genbank # 
Q09444) (amino acids 1-251 of SEQ ID NO: 20), UCH DROME 
(genbank # P35122) [SEQ ID NO: 21} (aa 1-227), YUHl 
(genbank # P35127) [SEQ ID NO: 22] (aa 1-236), UCHL-1 
(genbank # P09936) [SEQ ID NO: 24] (aa 1-223), UCHL-3 

35 (genbank # P15374) [SEQ ID NO: 23] (aa 1-230). BAP1 
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(amino acids 630-729 of SEQ ID NO: 2) is further compared 
to CAEEL-C08B11.7 (amino acids 238-326 of SEQ ID NO: 20). 
The BLAST search algorithm was used to identify proteins 
closely related to BAPl [Altschul et al, J * Mol. Biol. . 
5 215:403-410 (1990)]. The UCH domain of four of these 
proteins were aligned with BAPl using the CLUSTALW 
(ver.1.6) algorithm [Thompson et al, cited above]. Areas 
of homology with other UCH's are boxed. Only 
CAEEL-C08B11.7 showed any homology outside of the 
10 enzymatic region. 

FIG. 3G is a schematic comparison of the BAPl and 
UCH's. The region necessary for the interaction with 
BRCA1 (AAs 598-729) is indicated in the diagrams with 
light crosshatching. 

is Retailed pescrlption of the invention 

The present invention provides a novel protein, 
BRCA1 associated protein-1 (BAPl). BAPl is a novel, 
nuclear localized, enzyme which displays the signature 
motifs and activities of a ubiquitin carboxy-terminal 

20 hydrolase, i.e., BAPl cleaves model ubiquitin substrates 
in vitro. In fact, BAPl is the first nuclear-localized 
ubiquitin carboxy-terminal hydrolase to be identified. 
The ubiquitin hydrolase function of BAPl implicates the 
ubiquitin-proteasome pathway in either the regulation, or 

25 as a direct effector, of BRCA1 function. Thus, BAPl 

likely has a broad role in ubiquitin-dependent regulatory 
processes within the nucleus, including the emerging role 
of ubiquitin conjugation as a subcellular targeting 
signal, as well as in transcription, chromatin 

30 remodeling, cell cycle control and DNA 
repair/recombination. 

BAPl also enhances the tumor growth suppression 
properties of BRCA1 in colony formation assays and does 
so in a manner dependent upon the UCH enzymatic domain 
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and the BRCAl-interaction domain. BAP1 specifically 
binds to the wild-type BRCA1 RING finger domain 
(BRCA1-RF) both in vitro and in vivo, but not to mutant 
BRCAl-RF's, e.g., the C61G or C64G mutated RING fingers 
5 found in tumors from breast cancer kindreds or other 
closely related RING fingers. The interaction between 
BAP1 and BRCA1 occurs in vitro and BAP1 mRNA is expressed 
in those tissues which also express BRCA1. Thus, BAP1 
has a role as a tumor suppressor gene. 

10 As described below, the yeast two-hybrid system was 

employed to isolate mouse and human clones of BAP1. The 
human BAP1 locus was mapped to human chromosome 3p21.3. 
Rearrangements and intragenic homozygous deletions and 
mutations of BAP1 have been found in lung carcinomas, 

15 including homozygous deletions found in non-small cell 
lung cancers. 

Together, this evidence supports the role of BAP1 
as a tumor suppressor gene and as a regulator or an 
effector in BRCA1 growth control pathways. Both the 

20 specificity of the BRCA1 RING finger-BAPl interaction and 
the fact that independent, tumor-derived missense 
mutations in the cysteines in the BRCAl RING finger 
domain abolish interaction with BAP1 provide compelling 
evidence for the physiological relevance of this 

25 interaction. 

The invention further provides nucleic acid 
sequences which encode BAP1 or fragments of BAP1 which 
have a biological function, diagnostic and therapeutic 
reagents, as well as methods of using BAP1, its nucleic 

30 acid sequences, and antibodies developed thereto. The 

nucleic acid sequences, protein, amino acid sequences and 
antibodies directed to BAP1 are useful in the detection, 
diagnosis and treatment of cancers associated with 
inappropriate BAP-1 levels and/or loss of chromosomal 

35 region 3p21. 
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In one embodiment, the nucleic acid sequence of the 
invention is an about 3.5 kb cDNA [SEQ ID NO: 1], 
encoding BAP-1. BAP-1 is a 729 amino acid protein [SEQ 
ID NO: 2] which interacts through its carboxy terminus 
5 with the BRCA1 RING finger domain. In addition to 
containing the 250 amino acid amino terminal UCH 
catalytic domain, it includes a long carboxy-terminal 
extension with rich in proline, serine and threonine and 
and contains a short region of extreme acidity in which 

10 12 of 13 amino acids are either Glu or Asp, elements 
which may confer a short half-life upon the protein 
[Rechsteiner et al, Trends Biochem. Sci. . 21:267-271 
(1996) ]• The extreme carboxy-terminus encodes two 
potential nuclear localization signals which overlap the 

15 approximately 125 amino acid BRCAl-interaction domain. 
It was this domain that was independently isolated from 
mouse and human libraries in the two-hybrid screen of 
Example 2 and is predicted to fold into a long 
amphipathic helix of coiled-coil character, the structure 

20 of which may be important for BRCA1 interaction. 

Truncation into this region or substitution of a proline 
for leucine 691 abolish the BAP1-BRCA1 interaction. A 
potential splice variant in BAP1 results in loss of 31 
amino acids of the BRCA1 interaction domain and greatly 

25 reduces the ability of BAP1 to bind the BRCA1 RING 
finger, further suggesting that the BAP1-BRCA1 
interaction is regulated. Thus, the BAP1 

carboxy-terminus is tethered to BRCA1 via the RING finger 
domain and that the UCH catalytic domain is free to 

30 interact with ubiquitin substrates. 

Northern analysis showed that BAP-1 is a ~4kb mRNA 
expressed in a variety of tissues and cell lines. The 
cDNA encodes a protein of 80 kD predicted molecular 
weight. However, expression of the cDNA in vitro or in 

35 C0S1 cells generated a protein with an apparent molecular 
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weight of approximately 91 kDa suggesting possible post- 
translational modifications. Localization of BAP-1 by 
cell fractionation indicated that it is predominantly a 
nuclear protein. Chromosomal analysis by fluorescent in 
5 situ hybridization (FISH) localized BAP-1 to chromosome 
3p21, a genomic region found to be deleted in some breast 
cancers. The loss of BAP-1 function, individually or in 
tandem with BRCA1, is anticipated to be associated with 
breast cancer progression. Thus, the BAP-1 protein may 

10 mediate BRCA1 function and inhibit its oncogenic 

activity. These and other aspects of the invention are 
discussed in more detail below. 

BRCA1 is likely a direct substrate for the UCH 
activity of BAP1. Thus, in contrast to all of the known 

15 UCHs which are comprised entirely of the UCH domain, the 
carboxy-terminal extension of BAP1 provides substrate 
and/ or targeting specificity for the catalytic function. 
Regulated ubiquitination of BRCA1 and subsequent 
proteasome-mediated degradation would not be surprising 

20 given that both BRCA1 levels and subnuclear localization 
are tightly regulated in the mitotic cell -cycle and 
during meiosis [Gudas et al, cited above; Scully et al, 
cited above; Zabludoff et al, cited above]. 
BAPl-mediated deubiquitination of BRCA1 would be expected 

25 to stabilize the protein and protect it from 

proteasome-mediated degradation. This is consistent with 
both the ability of co-transfected BAP1 to enhance the 
tumor suppressor effects of BRCA1 in colony formation 
assays and the finding of mutations in BAP1 in cancer 

30 cell lines. 

The BRCA1-BAP1 association may also serve to target 
the UCH domain to other substrates. These substrates may 
be bound to other sites on BRCA1. BRCA1 could be 
construed as an assembly or scaffold molecule for 

35 regulated assembly of multiprotein complexes, a function 
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which has been postulated for other tumor suppressor 
proteins [e.g. pRb; Sellers et al, BiocMro, pjpphyp, 
Act?., 12fifi:Ml-5 (1996); Welch et al, Genes pe Yt , 9:31-46 
(1995)]. BAP1 may thus be a regulator of this assembly 
5 via controlled ubiquitin proteolysis, similar to two 
other RING finger-containing proteins involved in 
controlled proteolysis processes, i.e., a mouse homologue 
of the drosophila seven-in-absentia (siah; a RING finger 
protein) and the herpes virus protein VMW110 RING finger 
10 protein [Everett et al, EMBO J. . 16:566-577 (1997)]. 

The following description defines the aspects of 
this invention in more detail. 

I. Nucleic Acid Sequences 

The present invention provides mammalian nucleic 

15 acid sequences encoding BAP-1. The nucleic acid 

sequences of this invention are isolated from cellular 
materials with which they are naturally associated. In 
one embodiment, a BAP-1 cDNA sequence is provided in SEQ 
ID NO:l (FIGS. 3A-3E) . 

20 Given the cDNA sequences of SEQ ID NO: 1, one of 

skill in the art can readily obtain the corresponding 
anti-sense strands of these DNA sequences. Further, 
using known techniques, one of skill in the art can 
readily obtain genomic sequences corresponding to these 

25 DNA sequences or the corresponding RNA sequences, as 
desired. 

Similarly the availability of SEQ ID NO: 1 of this 
invention permits one of skill in the art to obtain other 
species BAP-1 analogs, by use of the nucleic acid 
30 sequences of this invention as probes in a conventional 
technique, e.g., polymerase chain reaction. Allelic 
variants of these sequences within a species (i.e., 
sequences containing some individual nucleotide 
differences from a more commonly occurring sequence 
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within a species, but which nevertheless encode the same 
protein) such as other human variants of BAP-1 SEQ ID NO: 
2, may also be readily obtained given the knowledge of 
this sequence provided by this invention. 
5 The present invention further encompasses nucleic 

acid sequences capable of hybridizing under stringent 
conditions [see f J. Sambrook et al, Molecular Cloning: A 
Laboratory Manual . 2d ed., Cold Spring Harbor Laboratory 
(1989)] to the sequences of SEQ ID NO: 1, their anti- 

10 sense strands, or biologically active fragments thereof. 
An example of a highly stringent hybridization condition 
is hybridization at 2XSSC at 65 °C, followed by a washing 
in 0.1XSSC at 65 B C for an hour. Alternatively, an 
exemplary highly stringent hybridization condition is in 

15 50% formamide, 4XSSC at 42 *C. Moderately high stringency 
conditions may also prove useful, e.g. hybridization in 
4XSSC at 55 ft C, followed by washing in 0.1XSSC at 37°C for 
an hour. An alternative exemplary moderately high 
stringency hybridization condition is in 50% formamide, 

20 4XSSC at 30*C. 

Also encompassed within this invention are fragments 
of the above-identified nucleic acid sequences. 
Preferably, such fragments are characterized by encoding 
a biologically active portion of BAP-1, e.g., an epitope, 

25 Generally, these oligonucleotide fragments are at least 
15 nucleotides in length. However, oligonucleotide 
fragments of varying sizes may be selected as desired. 
Such fragments may be used for such purposes as 
performing the PGR, e.g., on a biopsied tissue sample. 

30 For example, particularly useful fragments of BAP-1 cDNA 
[SEQ ID N0:1] and corresponding sequences include the 
open reading frame, nt 40-2226, the nuclear localization 
sites, nt 2005 to 2022 and nt 2188 to 2205, a region of 
acidity at nt 1225 to 1263, and the BRCA1-RF- interactive 

35 domain at nt 1831 to 2226 of SEQ ID N0:1. other 



WO 98/05968 PCIYUS97/13684 



14 

fragments which are contained within the above identified 
fragments or which overlap them and demonstrate similar 
biological activities, e.g., those which differ by 1 to 9 
bases, are also desirable. Similarly, other useful 
5 fragments may be readily identified by one of skill in 
the art by resort to conventional techniques, such as, by 
deletion mutagenesis, fusion to other proteins, or by 
motif searches in computer databases. In addition, other 
suitable techniques are known. 

10 The nucleotide sequences of the invention may be 

isolated by conventional uses of polymerase chain 
reaction or cloning techniques such as those described in 
obtaining the murine and human sequences, described 
below. Alternatively, these sequences may be constructed 

15 using conventional genetic engineering or chemical 
synthesis techniques. 

According to the invention, the nucleic acid 
sequences may be modified. Utilizing the sequence data 
in FIGS. 3A-3E [SEQ ID NO: 1] and in the sequence 

20 listing, it is within the skill of the art to obtain 

other polynucleotide sequences encoding the proteins of 
the invention. Such modifications at the nucleic acid 
level include, for example, modifications to the 
nucleotide sequences which are silent or which change the 

25 amino acids, e.g. to improve expression or secretion. 
Also included are allelic variations, caused by the 
natural degeneracy of the genetic code. 

Also encompassed by the present invention are 
mutants of the BAP-1 gene provided herein. Such mutants 

30 include amino terminal, carboxy terminal or internal 

deletions which are useful as dominant inhibitor genes. 
Such a truncated, or deletion, mutant may be expressed 
for the purpose of inhibiting the activity of the full- 
length or wild-type gene. 
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These nucleic acid sequences are useful for a 
variety of diagnostic and therapeutic uses. 
Advantageously , the nucleic acid sequences are useful in 
the development of diagnostic probes and antisense probes 
5 for use in the detection and diagnosis of conditions 

characterized by BRCA1 mutation* Additionally, the BAP-1 
gene has been mapped to chromosome 3p21.3. Thus, these 
sequences provide a good marker for further analysis of 
chromosome 3. The nucleic acid sequences of this 
10 invention are also useful in the production of mammalian, 
and particularly, human BAP-1 proteins and peptides. 

11 • Protein Sequences 

The present invention also provides mammalian BAP-1 
polypeptides, peptides or proteins. These proteins are 

15 free from association with other contaminating proteins 

or materials with which they are found in nature. In one 
embodiment, the invention provides a human BAP-1 [SEQ ID 
NO: 2] polypeptide of 729 amino acids having a predicted 
molecular weight (MW) of about 81 kD. In another 

20 embodiment, the invention provides partial human and 
murine BAP-1 proteins [SEQ ID NO: 6-9] (FIG. 2B) . 

Also included in the invention are analogs, or 
modified versions, of the proteins provided herein. 
Typically, such analogs differ by only one to four codon 

25 changes. Examples include polypeptides with minor amino 
acid variations from the illustrated amino acid sequences 
of BAP-1 (FIGS. 3A-3E) [SEQ ID NO: 2]; in particular, 
conservative amino acid replacements. Conservative 
replacements are those that take place within a family of 

30 amino acids that are related in their side chains and 

chemical properties. Also provided are homologs of the 
proteins of the invention which are characterized by 
having at least about 85% or higher homology with SEQ ID 
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NO: 2. Based on the sequence information provided herein, 
one of skill in the art can readily obtain BAP-1 from 
other mammalian species* 

Further encompassed by this invention are fragments 
5 of the BAP-1 polypeptide. Such fragments are desirably 
characterized by having BAP-1 biological activity, 
including, e.g., the ability to bind specifically to the 
RING finger domain of wild-type BRCA1. These fragments 
may be designed or obtained in any desired length, 

10 including as small as about 5-8 amino acids in length. 

Such a fragment may represent an epitope of the protein. 
Alternatively, the BAP-1 proteins [SEQ ID NO: 2] of the 
invention may be modified, for example, by truncation at 
the amino or carboxy termini, by elimination or 

15 substitution of one or more amino acids, or by any number 
of now conventional techniques to improve production 
thereof, to enhance protein stability or other 
characteristics, e.g. binding activity or 

bioavailability, or to confer some other desired property 

20 upon the protein. 

Currently, desirable proteins or peptides correspond 
to the nuclear localization sites, residues 656 to 661 
and residues 717 to 722 of SEQ ID NO: 2, a region of 
extreme acidity, residues 396 to 408 SEQ ID NO: 2, and the 

25 interactive domain, residues 598 to 729 of SEQ ID NO: 2. 
Another suitable fragment, which has homology to 
ubiquitin carboxy 1 -terminal hydrolase, isozyme L3, is 
located between about amino acids 1 to about 214 of SEQ 
ID NO: 2. Yet another suitable fragment, corresponding 

30 to residues 483 to 576 of SEQ ID NO: 2 has been used to 
generate antibodies. Other suitable fragments include 
amino acids 1 to 313, 1 to 325, 1 to 352, and 1 to 426 of 
SEQ ID NO: 2. Additionally, fragments which are about in 
the range of the above amino acid residues, e.g., which 

35 differ by 1 to 5 amino acids, are anticipated to be 
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particularly desirable. Still other suitable BAP-l 
fragments are identified in the Examples or may be 
readily identified and prepared by one of skill in the 
art using known techniques , such as deletion mutagenesis 
5 and expression. 

111 • Expression 

A. In Vitro 

To produce recombinant BAP-l proteins of this 
invention, the DNA sequences of the invention are 

10 inserted into a suitable expression system. Desirably, a 
recombinant molecule or vector is constructed in which 
the polynucleotide sequence encoding BAP-l is operably 
linked to a heterologous expression control sequence 
permitting expression of the BAP-l protein. Numerous 

15 types of appropriate expression vectors are known in the 
art for mammalian (including human) protein expression, 
by standard molecular biology techniques. Such vectors 
may be selected from among conventional vector types 
including insects, e.g., baculovirus expression, or 

20 yeast, fungal, bacterial or viral expression systems. 

Other appropriate expression vectors, of which numerous 
types are known in the art, can also be used for this 
purpose . 

Methods for obtaining such expression vectors 
25 are well-known. See, Sambrook et al, Molecular Cloning. 
A Laboratory Manual. 2d edition, Cold Spring Harbor 
Laboratory, New York (1989); Miller et al, Genetic 
Engineering . 4:277-298 (Plenum Press 1986) and references 
cited therein - 
30 Suitable host cells or cell lines for 

transfection by this method include mammalian cells, such 
as Human 293 cells, Chinese hamster ovary cells (CHO) , 
the monkey COS-1 cell line or murine 3T3 cells derived 
from Swiss, Balb-c or NIH mice may be used. Another 
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suitable mammalian cell line is the CV-l cell line. 
Still other suitable mammalian host cells, as well as 
methods for transfection, culture, amplification, 
screening, production, and purification are known in the 
5 art. [See, e.g. , Gething and Sambrook, Nature , 

293 : 620-625 (1981), or alternatively, Kaufman et al, Mol. 
Cell. Biol. . 5(7) :1750-1759 (1985) or Howley et al, U. S. 
Patent 4,419,446]. 

Similarly bacterial cells are useful as host 

10 cells for the present invention. For example, the 

various strains of E. coli (e.g., HB101, MC1061, and 
strains used in the following examples) are well-known as 
host cells in the field of biotechnology. Various 
strains of B. subtilis, Pseudomonas, other bacilli and 

15 the like may also be employed in this method. 

Many strains of yeast cells known to those 
skilled in the art are also available as host cells for 
expression of the polypeptides of the present invention. 
Other fungal cells may also be employed as expression 

20 systems. 

Alternatively, insect cells such as Spodoptera 
frugipedera (Sf9) cells may be used. 

Thus, the present invention provides a method 
for producing a recombinant BAP-1 protein which involves 

25 transfecting a host cell With at least one expression 

vector containing a polynucleotide of the invention under 
the control of a transcriptional regulatory sequence, 
e.g., by conventional means such as electroporation. The 
trans fected or transformed host cell is then cultured 

30 under conditions that allow expression of the BAP-1 

protein. The expressed protein may then be recovered, 
isolated, and optionally purified from the cell (or from 
the culture medium, if expressed extracellularly) by 
appropriate means known to one of skill in the art. 
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For example, the proteins may be isolated in 
soluble form following cell lysis, or may be extracted 
using known techniques, e.g., in guanidine chloride. If 
desired, the BAP-1 proteins of the invention may be 
5 produced as a fusion protein. For example, it may be 
desirable to produce BAP-1 fusion proteins, to enhance 
expression of the protein in a selected host cell, to 
improve purification, or for use in monitoring the 
presence of BAP-1 in tissues, cells or cell extracts. 
10 Suitable fusion partners for the BAP-1 proteins of the 

invention are well known to those of skill in the art and 
include, among others, p-galactosidase, glutathione-S- 
transf erase, and poly-histidine. 
B. In Vivo 

15 Alternatively, where it is desired that the 

BAP-1 protein (whether full-length or a desirable 
fragment) be expressed in vivo, e.g., for gene therapy 
purposes, an appropriate vector for delivery may be 
readily selected by one of skill in the art. Exemplary 

20 gene therapy vectors are readily available from a variety 
of academic and commercial sources, and include, e.g., 
adeno-associated virus [International patent application 
No. PCT/US91/03440] , adenovirus vectors [M. Kay et al, 
Proc. Natl. Acad. Sci. USA , 21:2353 (1994); S. Ishibashi 

25 et al, J. Clin. Invest. . 92:883 (1993)], or other viral 
vectors, e.g., various poxviruses, vaccinia, etc. 
Methods for insertion of a desired gene, e.g., BAP-1, and 
obtaining in vivo expression of the encoded protein, are 
well known to those of skill in the art. 

30 IV. Antisera and Antifrgdjes 

The BAP-1 proteins of this invention are also useful 
as antigens for the development of anti-BAP-1 antisera 
and antibodies to BAP-1 or to a desired fragment of a 
BAP-1 protein. Specific antisera may be generated using 
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known techniques . See, Sambrook, cited above, Chapter 
18, generally, incorporated by reference and Example 5 
below. Similarly, antibodies of the invention, both 
polyclonal and monoclonal, may be produced by 
5 conventional methods. These techniques may include the 
Kohler and Milstein hybridoma technique, recombinant 
techniques, such as described by Huse et al, Science . 
246 :1275-1281 (1988), or any other techniques known to 
the art. 

10 Also encompassed within this invention are humanized 

and chimeric antibodies. As used herein, a humanized 
antibody is defined as an antibody containing murine 
complementary determining regions (CDRs) capable of 
binding to BAP-1 or a fragment thereof, and human 

15 framework regions. These CDRs are preferably derived 

from a murine monoclonal antibody (MAb) of the invention. 
As defined herein, a chimeric antibody is defined as an 
antibody containing the variable region light and heavy 
chains, including both CDR and framework regions, from a 

20 BAP-1 MAb of the invention and the constant region light 
and heavy chains from a human antibody. Methods of 
identifying suitable human framework regions and 
modifying a MAb of the invention to contain same to 
produce a humanized or chimeric antibody of the 

25 invention, are well known to those of skill in the art. 
See, e.g., E. Mark and Padlin, "Humanization of 
Monoclonal Antibodies", Chapter 4, The Handbook of 
Experimental Pharmacology , Vol. 113, The Pharmacology of 
Monoclonal Antibodies, Springer-Verlag (June, 1994). 

30 Other types of recorabinantly-designed antibodies are also 
encompassed by this invention. 

Further provided by the present invention are anti- 
idiotype antibodies (Ab2) and anti-anti-idiotype 
antibodies (Ab3) . Ab2 are specific for the target to 

35 which anti-BAP-1 antibodies of the invention bind and Ab3 
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are similar to BAP-1 antibodies (Abl) in their binding 
specificities and biological activities [see, e.g., M. 
Wettendorff et al., "Modulation of anti-tumor immunity by 
anti-idiotypic antibodies." In Idiotypic Network and 
5 Diseases , ed. by J. Cerny and J. Hiernaux J, Am. Soc. 
Microbiol., Washington DC: pp. 203-229, (1990)]. These 
anti-idiotype and anti-anti-idiotype antibodies may be 
produced using techniques well known to those of skill in 
the art. Such anti-idiotype antibodies (Ab2) can bear 

10 the internal image of BAP-1 and bind to BRCA1 in much the 
same manner as BAP-1, and are thus useful for the same 
purposes as BAP-1. 

In general, polyclonal antisera, monoclonal 
antibodies and other antibodies which bind to BAP-1 as 

15 the antigen (Abl) are useful to identify epitopes of BAP- 
1, to separate BAP-1 from contaminants in living tissue 
(e.g., in chromatographic columns and the like), and in 
general as research tools and as starting material 
essential for the development of other types of 

20 antibodies described above. Anti-idiotype antibodies 

(Ab2) are useful for binding BRCA1 and thus may be used 
in the treatment of cancers. The Ab3 antibodies may be 
useful for the same reason the Abl are useful. Other 
uses as research tools and as components for separation 

25 of BAP-1 from other contaminant of living tissue, for 
example, are also contemplated for the above-described 
antibodies. 

V. Diagnostic Reagents and Methods 

Advantageously, the present invention provides 
30 reagents and methods useful in detecting and diagnosing 

abnormal levels of BAP-1, (i.e., deficiencies or excesses 
thereof) in a patient. Conditions associated with excess 
levels of BAP-1 may be indicative of BRCA1 mutations. 
Abnormal levels of BAP-1 may be associated with a variety 
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of cancers, including lung cancer (small cell lung 
carcinoma and non-small cell lung carcinoma) , breast 
cancers, uterine carcinomas, and oral squamous cell 
carcinomas, among others. 
5 Thus, the proteins, protein fragments, antibodies, 

and polynucleotide sequences (including anti-sense 
polynucleotide sequences and oligonucleotide fragments) , 
and BAP-1 antisera and antibodies of this invention may 
be useful as diagnostic reagents. These reagents may 

10 optionally be labelled using diagnostic labels, such as 

radioactive labels, colorimetric enzyme label systems and 
the like conventionally used in diagnostic or therapeutic 
methods* Alternatively, the N- or C-terminus of BAP-1 or 
a fragment thereof may be tagged with a viral epitope 

15 which can be recognized by a specific antisera. The 

reagents may be used to measure abnormal BAP-1 levels in 
selected mammalian tissue using conventional diagnostic 
assays, e.g., Southern blotting, Northern and Western 
blotting, polymerase chain reaction (PCR) , reverse 

20 transcriptase (RT) PCR, immunostaining, and the like. 

For example, in biopsies of tumor tissue, loss of BAP-1 
expression in tumor tissue could be directly verified by 
RT-PCR or immunostaining. Alternatively, a Southern 
analysis, genomic PCR, or fluorescence in situ 

25 hybridization (FISH) may be performed to confirm BAP-1 
gene rearrangement. 

In one example, as diagnostic agents the 
polynucleotide sequences may be employed to detect or 
quantitate normal BAP-1. The selection of the 

30 appropriate assay format and label system is within the 
skill of the art and may readily be chosen without 
requiring additional explanation by resort to the wealth 
of art in the diagnostic area. 

Thus the present invention provides methods for the 

35 detection of disorders characterized by inappropriate 
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BAP-1 levels. The protein, antibody, antisera and 
polynucleotide reagents of the invention are expected to 
be useful in the following methods. The methods involve 
contacting a selected mammalian tissue, e.g., a biopsy 
5 sample or other cells, with the selected reagent, 

protein, antisera antibody or DNA sequence, and measuring 
or detecting the amount of BAP-1 present in the tissue in 
a selected assay format based on the binding or 
hybridization of the reagent to the tissue. 

10 VI. Therapeutic Compositions and Methods 

BAP-1 is believed to have a role in modulating the 
activity of BRCA1, a tumor suppressor. More 
particularly, BAP-1 enzymatic activity is anticipated to 
have a role in the persistence of BRCA1 in a cell. For 

15 example, the extended presence of BRCA1, particularly in 
high levels, is associated with cell death. Thus, by 
adjusting BAP-1 levels in a cell, e.g., by use of BAP-1 
or an inhibitor identified by the invention, persistence 
of BRCA-l in the cells can thereby be altered. For 

20 example, it may be desirable to adjust BAP-1 levels so as 
to enhance BRCA1 persistence in a cell, e.g., a tumor 
cell. Alternatively, it may be desirable to adjust 
BAP-1 levels so as to increase BRCA1 degradation in the 
cell. The compositions and methods useful for the 

25 treatment of conditions associated with inadequate or 

undesirable BAP-1 levels are provided. As stated above, 
included among such conditions are liver and breast 
cancers . 

The therapeutic compositions of the invention may be 
30 formulated to contain an anti-idiotype antibody of the 
invention, the BAP-1 protein itself or a fragment 
thereof. The therapeutic composition desirably contains 
0.01 nq to 10 mg protein. These compositions may contain 
a pharmaceutically acceptable carrier. Suitable carriers 



WO 98/05968 



PCT/US97/13684 



24 

are well known to those of skill in the art and include, 
for example, saline. Alternatively, such compositions 
may include conventional delivery systems into which 
protein of the invention is incorporated. Optionally, 
these compositions may contain other active ingredients, 
e.g., chemotherapeutics • 

Still another method involves the use of the BAP-1 
polynucleotide sequences for gene therapy. In the 
method, the BAP-1 sequences are introduced into a 
suitable vector for delivery to a cell containing a 
deficiency of BAP-1 and/or to block tumor growth. By 
conventional genetic engineering techniques, the BAP-1 
gene sequence may be introduced to mutate the existing 
gene by recombination or to replace an inactive or 
missing gene. 

Generally, a suitable vector-based treatment 
contains between lxl(T 3 pfu to lxlO 12 pfu per dose. 
However, the dose, timing and mode of administration of 
these compositions may be determined by one of skill in 
the art- Such factors as the age, condition, and the 
level of the BAP-1 deficiency detected by the diagnostic 
methods described above, may be taken into account in 
determining the dose, timing and mode of administration 
of the therapeutic compositions of the invention. 
Generally, where treatment of an existing cancer is 
indicated, a therapeutic composition of the invention is 
preferably administered in a site-directed manner and is 
repeated as needed. Such therapy may be administered in 
conjunction with conventional therapies, including 
radiation and/or chemotherapeutic treatments. 

VII. Drug Screening and Development 

The proteins, antibodies and polynucleotide 
sequences of the present invention may also be used in 
the screening and development of chemical compounds or 
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proteins which have utility as therapeutic drugs for the 
treatment of cancers characterized by BAP-1 and/or BRCA1 
mutation or loss* As one example, a compound capable of 
binding to RAP-1 and preventing its biological activity 
5 may be a useful drug component for the treatment or 

prevention of cancer. The methods described herein may 
also be applied to fragments of BAP-1. 

Suitable assay methods may be readily determined by 
one of skill in the art. Where desired, and depending on 

10 the assay selected, BAP-1 may be immobilized directly or 
indirectly (e.g., via an anti-BAP-1 antibody) on a 
suitable surface, e.g., in an ELISA format. Such 
immobilization surfaces are well known. For example, a 
wettable inert bead may be used. Alternatively, BAP-1 

15 may be used in screening assays which do not require 

immobilization, e.g., in the screening of combinatorial 
libraries. 

Assays and techniques exist for the screening and 
development of drugs capable of binding to selected 

20 regions of BAP-1. These include the use of phage display 
system for expressing the BAP-1 proteins, and using a 
culture of trans fected E. coli or other microorganism to 
produce the proteins for binding studies of potential 
binding compounds. See, for example, the techniques 

25 described in G. Cesarini, FEBS Letters , 307 (1) : 66-70 

(July 1992); H. Gram et al., J. Immunol. Meth. . 161 H69- 
176 (1993); C. Summer et al.,. Proc. Natl. Acad, Sci., 
USA . 89:3756-3760 (May 1992), incorporated by reference 
herein. 

30 Other conventional drug screening techniques may be 

employed using the proteins, antibodies or polynucleotide 
sequences of this invention. As one example, a method 
for identifying compounds which specifically bind to a 
BAP-1 protein can include simply the steps of contacting 

35 a selected BAP-1 protein with a test compound to permit 
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binding of the test compound to BAP-1; and determining 
the amount of test compound, if any, which is bound to 
the BAP-1 protein. Such a method may involve the 
incubation of the test compound and the BAP-1 protein 
immobilized on a solid support. 

Typically, the surface containing the immobilized 
ligand is permitted to come into contact with a solution 
containing the BAP-1 protein and binding is measured 
using an appropriate detection system. Suitable 
detection systems include the streptavidin horse radish 
peroxidase conjugate, direct conjugation by a tag, e.g., 
fluorescein. Other systems are well known to those of 
skill in the art. This invention is not limited by the 
detection system used. 

Another method of identifying compounds which 
specifically bind to BAP-1 can include the steps of 
contacting a BAP-1 protein immobilized on a solid support 
with both a test compound and the protein sequence which 
is a receptor for BAP-1 to permit binding of the receptor 
to the BAP-1 protein; and determining the amount of the 
receptor which is bound to the BAP-1 protein. The 
inhibition of binding of the normal protein by the test 
compound thereby indicates binding of the test compound 
to the BAP-1 protein. 

Thus, through use of such methods, the present 
invention is anticipated to provide compounds capable of 
interacting with BAP-1 or portions thereof, and either 
enhancing or decreasing its biological activity, as 
desired. Such compounds are believed to be encompassed 
by this invention. 

The assay methods described herein are also useful 
in screening for inhibition of the interaction between a 
BAP-1 protein of the invention and its ligand ( s) . The 
solution containing the inhibitors may be obtained from 
any appropriate source, including, for example, extracts 
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of supernatants from culture of bioorganisms, extracts 
from organisms collected from natural sources , chemical 
compounds, and mixtures thereof. 

These examples illustrate the preferred methods for 
5 obtaining and using the sequences and compositions of the 
invention. These examples are illustrative only and do 
not limit the scope of the invention. 

Example 1 - Construction of Expression Plasmids 
A. LexA Fusions 

10 A totally synthetic BRCA1 gene encoding the 

amino-terminal 100 amino acids of human BRCA1 (BRCA1-RF) , 
including the full ring-finger domain, was constructed. 
The BRCA1-RF domain was made using long overlapping 
oligonucleotides and PCR-mediated overlap-extension gene 

15 synthesis techniques [Madden et al, Science . 252:1550- 
1553 (1991)]. Codon usage was optimized for expression 
in E. coli and S. cerevisiae [Sharp et al, Nuc. Acids 
Res. , 16:8207-8211 (1988)] (Figs. 1A and IB). The 
following oligonucleotides were used. 

20 top strand: [SEQ ID NO: 10] 

5 1 -ATGGAACCTGTCTGCTCTGCGTGTTGAAGAAGTTCAAAACGTTATCAACGCTA- 
TGCAAAAGATCCTGGAATGTCCAATCTG 

bottom strand: [SEQ ID NO: 11] 
5 ■ -GGTTCAGCAGCTTCAGCATACAGAACTTACAGAAGATGTGGTCACACTTAGTG- 

2 5 GAAACTGGTTCCTTGATCAGTTCCAGACAGATTGGACATTCCAGGATC 
top strand: [SEQ ID NO: 12] 
5 ■ -GTATGCTGAAGCTGCTGAACCAAAAGAAGGGTCCATCTCAATGTCCACTGTG- 
TAAGAACGACATCACTAAGCGTTCTCTGCAAGAATCTACTCGTTTCTCTC 
bottom strand: [SEQ ID NO: 13] 

30 5 1 -TTCCAGACCAGTGTCCAGCTGGAAAGCACAGATGATCTTCAGCAGTTCTTCA- 
ACCAGTTGAGAGAAACGAGTAGATTCTTG 

Double-stranded DNA was generated by 5 cycles 
of the polymerase chain reaction (PCR) and the full- 
length cDNA was amplified further via PCR using "outside" 
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primers with homology to the 5' and 3 1 ends of the DNA 
sequence [Madden, cited above]. These primers contained 
enzymatic restriction sites for either EcoRI (BRCA1-RF-5 1 
primer): 5 • -GCTAGAATTCACCATGGACCTGTCTCCTCTG [SEQ ID NO: 
5 14] or Sal I (BRCA1-RF-3 1 primer): 

5 1 -GCTAGTCGACTTCCAGACCAGTGTCCAG [SEQ ID NO: 15]. 

The resulting complete, "wild-type" RF domain 
was confirmed by sequencing. The resulting BRCA1-RF was 
then fused in frame with the LexA DNA-binding domain to 

10 create a Lex A- BRCA1 -RF fusion construct by cloning the 

BRCA1-RF domain into the EcoRI-Sall restriction sites of 
the vector pBTM-116 [Vojtek et al, Cell . 71:205-214 
(1993)] (see Fig. IB). This LexA-BRCAl-RF construct was 
used as the probe ("bait) to screen for BRCA1- interacting 

15 proteins in a yeast 2 -hybrid analysis. 

Negative control/specificity controls for the 
specificity of the interaction in the yeast system were 
made (as LexA fusions) by mutating the BRCA1-RF (Figs. 1A 
and IB) as follows: 

20 (i) The Cys61Gly and Cys64Gly substitutions of 

BRCA1 which occur in breast cancer pedigrees. BRCA1 RF 
domain point mutants, BRCA1-C64G (Cys 64 to Gly) and 
BRCA1-C61 G (Cys 61 to Gly) , were created by 
PCR-mutagenesis using the "outside" primers described 

25 above and overlapping oligonucleotides containing the 
appropriate nucleotide change: BRCA1-C61 G-sense: 
5 1 -CCATCTCAAGGTCCACTGTGTAAG-3 1 [SEQ ID NO: 25]; 
BRCA1-C61 G-antisense : 5 1 CTTACACAGTGGACCTTGAGATGG-3 • [ SEQ 
ID NO: 26]; BRCAl-C64G-sense: 

30 5 1 -CAATGTCCACTGGGTAAGAACGACATC-3 ' [SEQ ID NO: 27]; and 
BRCAl-C64G-antisense : 5 1 -GATGTCGTTCTTACCC AGTGG ACATTG- 3 ' 
[SEQ ID NO: 28] [Ho et al-, Gene , 22:51-59 (1989)]. The 
BRCA1(C64G)-RF control has a point mutation in the 
BRCA1-RF found in a breast cancer kindred [Cast ilia et 

35 al, Nat. Genet. . 8:387-391 (1994)]. This mutation, a 
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Cys64 to Gly64, destroys one of the Zn chelating residues 
leading, presumably, to the loss of correct conformation 
of the RING domain. 

(ii) The protein equivalent of the del AG185 
5 mutation which results in a frame shift at amino acid 22 

followed by 17 out-of -frame amino acids and a stop codon. 
The BRCAl-delAG185 mutant was generated by PCR using the 
BRCA1-RF-5' oligonucleotide [SEQ ID NO: 14] and a 3' 
oligonucleotide that encoded the changed amino acid 

10 sequence : 5 • -GCATGGATCCTCAAACCTTGTGCAGGCAGGTACCCTG 

GTCAACAGGAGACAGGTGGGAAACCAGGATCTTTTGCATAGC-3 1 [SEQ ID NO: 
29]. The truncated protein generated by the delAG185 
mutation is found in high frequency in the Ashkenazi 
population [Struewing et al, Nat. Genet. , JJL: 198-200 

15 (1995)]. 

(iii) A truncated .BRCA1 RING finger at amino 
acid 31, the result of a PCR error. The BRCAl-del31 
truncation mutant was a mis-primed PCR reaction of 
BRCA1-RF identified by sequencing during the initial 

20 screens for a wild-type LexA-BRCAl. The BRCAl-RF-trunc 
control is a truncation of the BRCA1-RF , a protein of 35 
amino acids which ends within the first loop of the RING 
domain. 

(iv) The RPT-1 RING finger domain. The 

25 LexA-RPT-1 construct (amino acids 1-100) [SEQ ID NO: 5] 

was made by PCR-mediated amplification of the nucleotides 
representing the first 100 amino acids of the 
transcription factor RPT-1 [Patarca et al, Proc. Natl. 
Acad. Sci. USA . 2733-2737 (1988); RPT-1 cDNA kindly 

30 provided by Dr. H. Cantor] with the 5' and 3' primers 
incorporating EcoRI and Sal I restriction sites. 

(v) A non-specific control LexA fusion with 
RhoB. LexA-RhoB was a kind gift of Dr. George 
Prendergast, The Wistar Institute of Anatomy and Biology, 

35 Philadelphia, PA. 
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All LexA mutant fusion constructs were made, as 
described for the wild- type BRCA1-RF, by cloning the 
appropriate mutated BRCA1-RF domain into the vector 
pBTM-116. The RPT-1 PCR product was enzymatically 
digested and ligated into the corresponding sites in 
pBTM-116. All clones were confirmed by sequencing. 

The wild-type BRCA1-RF did not display 
intrinsic transcriptional activation function in yeast 
and proper expression of each LexA fusion in yeast was 
confirmed by Western blot analysis with anti-LexA DNA- 
binding domain antibody (data not shown) . 

These controls screen for proteins which 
interact only with wild-type the BRCA1-RF and not with 
any of the physiologically relevant BRCA1-RF mutations 
nor a RING finger that is the most similar to that of the 
BRCA1-RF [Miki et al, Science , 266:66-71 (1994)]. Thus, 
the controls make it possible to identify proteins which 
interact specifically with the BRCA1-RF and not with any 
other RING domain. 

Example 2 - Yeast Two-Hvbrid Screen for BRCA1-RF 
Interacting Proteins 

To identify the potential protein partners of BRCA1 , 
a yeast 2-hybrid analysis system as modified by Stan 
Hollenberg [Vojtek et al, cited above] was performed 
using the RING finger domain of human BRCA1 . Guided by 
the expression patterns of BRCA1 during mouse development 
and in human spleen, the cDNA libraries selected for 
screening with the LexA- BRCA1 -RF of Example 1 were (1) 
the human adult B cell, oligo-dT-primed, cDNA library 
[Durfee et al, Genes & Devel . . 7:555-569 (1993) (a kind 
gift of Dr. Steve Elledge) ] and (2) a whole mouse embryo 
(9.5-10.5 day), random-primed, cDNA library size selected 
for inserts of 300 to 500 base pairs in length [Vojtek et 
al, cited above; kind gift of Dr. Stan Hollenberg)]. 
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Briefly, the LexA-BRCAl-RF and a selected library 
were co-transformed into the L40 yeast strain. Positive 
protein interactions were selected by His auxotrophy. 
Fifty colonies were picked and grown for 10 generations 
5 without selection for the LexA-BRCAl-RF plasmid. 

Isolated clones of each colony , which were positive for 
the presence of only the library plasmid, were picked and 
mated with AMR70 yeast containing LexA-BRCAl-RF, one of 
its mutants, or one of the LexA controls of Example 1A. 

10 Positive matings were selected by growth on media 

requiring the presence of both plasmids. These colonies 
were then scored for LacZ production (positive 
interaction) and those which were positive for 
interaction with the wild-type BRCA1-RF, but not any of 

15 the controls, were processed for further analysis. 

Using the above assay methods, one hundred yeast 
colonies (50 from each library; each screen representing 
approximately 8-10 x 10 6 independent cDNAs) , randomly 
taken from approximately 5-700 total colonies which grew 

20 on solid media lacking the amino acid histidine, were 
selected for additional screening. 

Thirty-one cDNAs which specifically interacted with 
BRCA1-RF were obtained from the secondary screen of the 
two libraries. Eight of these (3 from the human library 

25 and 5 from the mouse library) encoded the same amino acid 
sequences . 

A representative secondary screen of one of the 
human clones, hBAP-1 (aa483-729; SEQ ID NO: 6), and 3 of 
the mouse clones, mBAP-1 (aa581-720; SEQ ID NO: 8), 

30 mBAP-1 (aa518-(del}-718; SEQ ID NO: 7), and mBAP-1 

(aa596-721; SEQ ID NO: 9) was performed by re-introducing 
the purified pACT plasmids containing them into naive 
yeast. The sequences of these clones are compared in 
FIG. 2B. This screen showed that each clone showed a 

35 strong interaction with the wild- type BRCA1 ring-finger, 
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but failed to interact with the C64G, C61G r del31, delAG, 
RPT-1, RhoB, or any of the specificity control LexA 
fusions (data not shown) . 

Thus, these clones specifically interact with only 
5 the BRCA1 RING finger. These cDNA clones all encode the 
same region of the same protein which has been termed 
BRCAl-Associated Protein-1 or BAP-1. Each clone shares 
the same translational reading frame with the 
transcriptional activation domain to which it is fused. 

10 In addition, the fusion junctions were different among 

the clones suggesting that the interaction was not due to 
a fusion- junction artifact . 

The longest cDNA retrieved in the two-hybrid screen 
was a 2.0 kb clone from the human library and encoded 246 

15 amino acids followed by a 1.3 kb 3'UTR. Each mouse clone 
encoded an overlapping , smaller subset of this human open 
reading frame and which served to partially map the 
minimal interaction domain. Further definition of this 
minimal interaction domain was performed by mutagenesis 

20 of this region of BAP1. 

The "minimal interaction domain" was determined by 
the shortest mouse clone [mBAP-l (aa596-72l; SEQ ID NO: 
9)]. To further define the specificity of interaction 
between BRCA1 and BAP-1, carboxy- and amino-terminal 

25 truncation mutants of mBAP-l were generated by PCR-based 
deletion or point mutagenesis. 

The appropriate region of mBAPl (596-721) was 
amplified by PCR using a vector primer 
pVP16 5 1 -primer , 5 1 -CCGATGCCCTTGGAATTGACGAG-3 ' ; 

30 pVP16 3' -primer, 5 ' -CGATGAATTCGAGCTAGCTTCTATC-3 • ) and the 
appropriate truncating oligonucleotide 
Mc43Ctl, 5 1 -GCATGAATTCTCAGCT CCGGCGCACTGAGATG-3 1 ; 
Mc4 3Ct2 , 5 1 -GCATGAATTCTCAAGCC AG CATGG ATATGAAGG- 3 1 ; 
Mc43Ct3 , 5 ' -GCATGAATTCTCAGTCATCAATCTTGAACTTC-3 1 ; 
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MC4 3Ct4 , 5 1 -GCATGAATTCTCATGCAATCTCGGCTTCTAC-3 1 ; or 
Mc43Ntl, 5»-GCATG GATCCCCAAGATTGATGACCAGCGAAGG-3 • [SEQ ID 
NOS: 30 to 36, respectively]. 

These oligonucleotides were generated with an 
5 incorporated EcoRI restriction site ( for the 3 1 end 
oligos) or a BamHI restriction site (for the 5' end 
oligos) . After PCR amplification, the product was cut 
with BamHI and EcoRI f and then ligated into the mouse 
library-yeast expression vector, pVP16 [Vojtek et al, 

10 cited above] • 

The point mutant mBAP-1 (L691P) was made by standard 
PCRr-based mutagenesis protocols [Ho et al, Gene . 17:51-59 
(1989)], using (Mc43(L691 P) sense-primer, 5»- 
GCTGGCCAACCCGGTGGAACAG-3 1 [SEQ ID NO: 37]; Mc43(L691P) 

15 antisense-primer, 5 » -CTGTTCCACCGGGTTGGCCAGC-3 1 [SEQ ID 
NO: 38] and using the same vector primers described 
above . 

The "minimal interaction domain" was deleted from 
the human sequence (the longest clone) and this protein 
20 (hBAP-1 (483-594) [SEQ ID NO: 6] was also assayed for 
interaction with the BRCA1-RF in the yeast 2 -hybrid 
system. 

All clones were confirmed by sequencing and 
expression in yeast was confirmed by western analysis 

25 using antibodies against the VP16 activation domain (data 
not shown) . Each individual mutant was co-transformed 
with LexA-BRCAl-RF into L40 yeast and tested for 
interaction via its ability to activate transcription 
from the LacZ locus. 

30 The mutants showed that deletion of protein sequence 

from the carboxy or amino termini of mBAP-1 (aa 596-721; 
SEQ ID NO: 9) almost completely destroyed the BAP1-BRCA1 
interaction , suggesting a complex interface between the 
proteins. Deletion of the last 20 amino acids of mBAP-1 

35 led to a significant reduction in the intensity of 
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interaction. Further deletions from the COOH- terminus 
led to the complete loss of interaction between BRCA1-RF 
and BAP-1. A single amino-terminal truncation which 
deleted approximately half of mBAP-1 (aa 596-721; SEQ ID 
5 NO: 9) led to an almost complete loss of interaction* 

Interestingly, the mBAP(518del718) clone interacted most 
poorly with BRCA1-RF and lacked a 93 bp sequence (the 
reading frame was maintained) , possibly the result of a 
naturally occurring splice variant. That these clones 

10 also fail to bind multiple, independent tumor-derived 
mutations of the BRCA1-RF provides strong genetic 
evidence for their relevance to the functions of BRCA1. 

The results of the above experiments suggested that 
some critical domain was being disrupted by these 

15 truncations. A careful analysis showed that the region 
from amino acids 632 to 729 of SEQ ID NO: 6 may in fact 
generate a coiled-coil domain. A point mutation in the 
middle of the domain (leucine 691 substituted with a 
proline) destroys interaction with the BRCA1 RING 

20 structure. This result is consistent with the BAP- 
1/BRCA-l interaction domain being a coiled-coil. 

Ey^ple 3 - Analysis of BAP-1 cDNA 

A nearly full-length cDNA was constructed via a 
combination of cDNA library screening, EST database 

25 searching, S^CE and RT-PCR (FIGs. 3A-3E) as follows. 
Searches of the protein and DNA databases [Altschul et 
al, J- Mol. Biol. , 215:403-410 (1990)] with the BAP-1 
protein/cDNA sequences obtained from the screening of 
Example 2, showed no significant matches with any known 

30 protein or cDNA. However, searches of the EST databases 
with BAP-1 cDNA yielded several "hits", including one 
whose clone had a 5' sequence that overlapped with the 3' 
sequence of another EST clone. The clones defined by 
these EST's were obtained from the I.M.A.G.E. consortium 
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[Lennon et al, Genomics . 32:151-152 (1996); clones #46154 
and #40642]. A partial BAP-1 cDNA clone (EST-BAP1) was 
generated by digesting clone #40642 with Hind III and Fsp 
I and clone #46154 with Fsp I and EcoRI. These two 
5 pieces were then ligated into the Hind III and EcoRI 
sites of the vector pcDNA3 (Invitrogen) . 

Analysis of the IMAGE consortium cDNA and its open- 
reading-frames suggested that this BAP-1 cDNA, as 
constructed, was not complete. Reverse-transcriptase-PCR 

10 was performed on RNA from normal human fibroblasts using 
a gene-specific primer: 5 • -GAAGCGGATGTCGTGGTAGG-3 9 [SEQ 
ID NO: 43] and identified 62 nucleotides which were 
missing from the "EST-BAP1" cDNA. These 62 nucleotides 
were inserted into the "EST-BAPl M cDNA by digestion of 

15 the RT-PCR product with the restriction enzymes Kpnl, 
which is a unique site within the 5 1 RT-PCR 
ol igonucleot ide : 

5 1 CCTGTTATTAACCCTCACTAAAGGGAAGGGTACCATGAATAAGGGCTGGCT 
GGAGC-3 1 [SEQ ID NO: 39] and 3* RT-PCR-ol igonucleot ide: 

20 5 1 -GAAGCGGATGTCGTGGTAGG-3 1 [SEQ ID NO: 40] r and Avrll. 
Ligation of the KpnI/Avrll digested RT-PCR fragment and 
Avrll/EcoRI digested "EST-BAPl" cDNA and the KpnI-EcoRI 
digested pcDNA3, produced the full-length BAP-1 cDNA. 
Thus, BAP1 cDNA [FIGS. 3 A to 3E; SEQ ID NO: 1] 

25 comprises 3525 bp, including a polyA tract with multiple 
polyA signals. Conceptual translation yields a long open 
reading frame of 729 amino acids [SEQ ID NO: 2] with a 
predicted MW of about 81 kDa and pi of 6.3. 

The presumptive initiator methionine is within a 

30 favorable context for translation start, however the 

short 5 , UTR of 39 bp encodes amino acids in^- frame with 
the presumptive methionine and does not contain a stop 
codon. BLAST searches and a domain analysis [Henikoff & 
Henikoff, Genomics . 19:97-107 (1994)] indicated that BAP1 

35 is a novel protein with motifs suggestive of function. 
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The amino-terminal 1*240 amino acids of SEQ ID NO: 2 
show significant homology to a class of thiol proteases, 
designated ubiquitin C-terminal hydrolase (UCH) , 
particularly Isozyme L3, which are implicated in the 
5 proteolytic processing of ubiquitin [Wilkinson et al, 
Science , 246 :670-673 (1989)]. These enzymes play a key 
role in protein degradation via the ubiqui tin-dependent 
proteasome pathway* The most closely related UCH is a 
hypothesized protein from C. elegans UCH-CAEEL, which 

10 shares 63% similarity (40% identity) with BAP1 through 
the UCH domain and is also likely to be a UCH enzyme. 
Pairwise similarities to other mammalian UCHs of 54% 
(UCHL3) and 56% (UCHL1) have also been found. Most 
importantly, the residues which form the catalytic site 

15 of BAP1 (Q85, C91, H169, and D184 of Figs. 3A-3E; SEQ ID 
NO: 2) are completely conserved, including the FELDG 
motif [Larsen et al, Biochemistry . 35:6735-6744 (1996) ]. 
In addition, a loop of highly variable sequence, which is 
disordered in the crystal lographic structure of human 

20 UCH-L3 [Johnston et al, EMBO J. . 16:3787-3796 (1997)], is 
present (residues 140 to 167 of SEQ ID NO. 2) . This loop 
may occlude the active site or provide substrate 
specificity for the enzyme. 

BAP1 has a number of additional motifs; a region of 

25 extreme acidity spanning amino acids 396 to 408 of SEQ ID 
NO. 2, as well as multiple potential phosphorylation 
sites and N-linked glycosylation sites. The C-terminal 
one-third is highly charged and is rich in proline, 
serine and threonine. The extreme c-terminus contains 

30 two putative nuclear localization signals, KRKKFK and 

RRKRSR (aa 656-661 and aa 717-722 of SEQ ID NO: 2) , and 
is hydrophilic; it is predicted to fold into a helical 
(possibly coiled-coil) structure. Indeed, within the 
BAPl minimal interaction domain, (i.e., from about amino 

35 acid 596 to 729 of SEQ ID NO: 2) the mutation of leucine 
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691 to a proline, a change predicted to disrupt the 
helical nature of this region, abolished the BAP1-BRCA1 
interaction, consistent with the hypothesis that BAP1 
uses a coiled-coil domain to interact with the RING 
finger domain of BRCA1. This overall architecture 
suggests that BAP1 is a new, structurally complex, and 
nuclear localized member of the UCH enzyme family. 

Example 4 - In Vitro Protein Association 

The direct interaction of the BRCAl-RF with BAP-1 
was confirmed by binding of the BRCAl-RF to the fusion 
proteins of glutathione-S-transf erase with BAP1. 
A. The BAP/ GST constructs 

The original B cell library two-hybrid BAP-1 
clone obtained from the screening experiments described 
in Example 2 was pACT-hBAPl (483-729) , which contained 
BAP1 amino acids 483-729 (nucleotides 1486 to 3525) [SEQ 
ID NOs: 2 and l f respectively)] in the pACT plasmid 
backbone. The glutathione S-transf erase/BAPl fusion 
protein, GST-hBAPl (483-729 of SEQ ID NO: 2), was 
generated by cloning nucleotides 1486 to 3525 of SEQ ID 
NO: 1 from that original clone into pGEX-5x-l (Pharmacia 
Biotech, Inc. ) . 

Another BAP1 construct which lacked the minimal 
BRCA1 interaction domain pACT-hBAPl (483-594 of SEQ ID NO: 
2), was generated and amplified by PGR using a pACT 5' 
vector primer 5 1 -GATGTATATAACTATCTATTCG-3 1 [SEQ ID NO: 
41] and the BAP 1-trunc. oligonucleotide: 5 1 -GCATAGATCTT 
CACCCCTGGCTGCCTTGGATTGG3 1 [SEQ ID NO: 42], which 
amplifies BAP1 nucleotides 1486-1821 of SEQ ID NO: 1. 
The resulting sequence was digested with restriction 
enzymes and ligated into the vector pACT. Another fusion 
protein GST-hBAPl (483-594 of SEQ ID NO: 2) lacking the 
minimal BAP1 interaction domain, was generated in the 



WO 98/05968 



PCT/US97/13684 



38 

same manner as pACT-hBAPl (483-594 of SEQ ID NO: 2), 
described above, but fused to GST. 

GST, and the BAP1 fusion constructs 
GST-hBAPl (483-729 of SEQ ID NO: 2) and GST-hBAPl (483-594 
5 of SEQ ID NO: 2), were expressed in E. coli and then 
purified [Frangioni et al, Anal . Biochem. . 210 : 179-187 
(1993)]. 35 S-LexA-BRCAl-RF and 35 S-BRCA1 were produced 
in vitro via coupled transcription/translation (TNT* , 
Promega Corp., Madison, WI) in the presence of 35 S-Met. 

10 B. Association Assay 

Association between the proteins was assayed 
essentially as described by Barlev et al, J. Biol. Chem. . 
270:19337-19344 (1995). Briefly, each GST resin was 
incubated with the LexA-BRCAI-RF in 100 ML of incubation 

15 buffer (PBS containing 0.2 inM ZnS0 A , 0.05% NP-40 and 1 mM 
PMSF) for 1 hour at 4°C followed by a second hour at room 
temperature. The resin and associated proteins were then 
washed in incubation buffer twice (1 mL at room 
temperature for 15 minutes) followed by four washes in 

20 PBS containing 300 mM NaCl, 0.2 mM ZnSO A , 0.1% NP-4 0 and 
1 mM PMSF. The associated proteins which remained bound 
to resin were eluted from the resin two times (15 
minutes), each with 250 ah of elution buffer (100 mM 
TRIS, pH 8.0, 150 mM NaCl, 0.1% NP-40, 20 mM reduced 

25 glutathione) . The two elutions were combined, 

concentrated to a volume of approximately 20 /iL of a 
50:50 resin slurry, and analyzed by SDS-PAGE and 
visualized by Coomassie blue staining and f luorography . 
Association of the BRCA1-RF with BAP1 was 

30 confirmed in vitro by specific binding of 35 S-labeled 
LexA-BRCAI-RF to GST-hBAPl (483-729 of SEQ ID NO: 2) 
fusion protein, but not to GST alone, confirming a 
physical association of the two proteins. 

To confirm that the association of the BRCA1-RF 

35 to BAP1 was not an artifact of using only a portion of 
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BRCA1, full length BRCA1 was expressed in vitro and 
incubated with GST and GST-hBAPl (483-729 of SEQ ID NO: 
2). As a further control for the specificity of the 
interaction, BRCA1 was also incubated with 
5 GST-hBAPl (483-594 of SEQ ID NO: 2) , the GST-BAP1 fusion 
protein lacking the minimal interaction domain. 

The BRCA1 protein specifically bound to 
GST-hBAPl (483-729 of SEQ ID NO: 2) and not to GST or 
GST-hBAPl (483-594 of SEQ ID NO: 2), confirming the direct 
10 interaction of BRCA1 with BAP1 through the C-terrainal 
region of BAP1. 

Example 5 - G eneration of Antibodies 

Oligonucleotide primers (pACT 5' -vector primer 
5 • -GATGTATATAACTATCTATTCG-3 1 [SEQ ID NO: 44]; BAP1 3 1 

15 primer (antibody) 5-CGTAGTCGACTGTCAGCGCCAGGGGACTC-3 1 [SEQ 
ID NO: 45] ), were used to amplify the portion of the BAP1 
cDNA [SEQ ID NO: 1] corresponding to amino acids 483 to 
576 of SEQ ID NO: 2 via PCR cloning. The PCR product was 
then digested with the appropriate restriction enzymes 

20 and ligated to the COOH-terminus of 6 Histidine residues 
of the vector pQE-30 (QIAGEN Inc.) . 

The His-tagged protein was purified from E. coli 
over a Ni-agarose column as described [Friedman et al, 
cited above] and was used to immunize rabbits for the 

25 production of polyclonal antibodies (Cocalico 
Biologicals , Inc . ) . 

Example 6 - Protein Expression of BAP1 

COS-1 cells were grown at 37 °C, 5% C02 in DMEM 
supplemented with 10% fetal bovine serum and 2mM 
30 L-glutaraine. C0S1 cells were transiently transfected 
using DOSPOR transfection reagent (Boehringer Mannheim 
Biochemicals) following the manufacturers protocol with 
plasmids containing the BAP1 cDNA, e.g., pACT-hBAPl (483- 
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729 of SEQ ID NO: 2). The BAP1 cDNA was transcribed and 
translated in vitro in the presence of 35 S-Methionine. 
35 S-labeled cytosolic and nuclear extracts were then 
prepared from transiently transfected COS1 cells. 
5 Immunoprecipitation of BAP1 was performed by 

previously described procedures for the metabolic 
labeling and iromunoprecipitation of proteins from cell 
lysates [Morris et al, Oncogene . 6:2339-2348, (1991); 
Rauscher et al, Science , 240 : 1010-1016 (1988); Friedman 

10 et al, cited above] with either pre-immune or anti-BAPl 
seras described in the above example. 

As a control for nuclear localization, KAP-1, a 
co-repressor of transcription localized to the nucleus 
[Friedman et al, Genes Dev. . 10:2067-2078, (1996)], was 

15 also immunoprecipitated from these cell fractions. 
Immunoprecipitation of this product with anti-BAP-1 
antiserum confirmed that the protein expressed in vitro 
from the cDNA resulted in a polypeptide that contained 
the antigen used to raise the antibodies produced as 

20 described above. BAP-1 was found primarily in the 
nuclear fraction although a significant amount was 
detected in the cytosol. However, this may be an 
artifact of the cell fractionation procedure, since KAP-1 
was also found to be present in both cytosolic and 

25 nuclear fractions and in approximately the same ratio as 
BAP-1. 

The expression of the BAP-1 cDNA in C0S1 cells in 
vitro followed by immunoprecipitation of 35 S-labeled 
whole cell extract and analysis by SDS-PAGE also yielded 
30 a single major protein with an apparent molecular weight 
of about 91 kDa. However, the largest BAP1 open reading 
frame encodes a protein of about 81 kDa predicted 
molecular weight. The difference between apparent and 
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predicted molecular weights may be accounted for by 
unusual properties of the C-terminus or by 
post-trans la tional modifications . 

Example 7 - Tissue and Cellular Expression of BAP-1 
5 A. BAP1 is Expressed in a Variety of Tissues 

The direct interaction between BAP1 and BRCA1 
illustrated in Example 4, suggests that BAP1 might be 
expressed in an overlapping subset of tissues expressing 
BRCA1 and that the subcellular location of BAP1 and BRCA1 

10 may be the same. 

The expression of BAP1 in a variety of human 
adult tissues was determined by Northern blot analysis. 
Northern blot hybridizations were performed as follows: 
Ten /ig total RNA from multiple tissue RNA blots (Clontech 

15 Laboratories, Inc., Palo Alto, CA) , was 

electrophoretically gel-fractionated and transferred to 
Hybond N+ membranes (Amersham) . The tissues represented 
were heart, brain, placenta, lung, liver, skeletal 
muscle, kidney, pancreas, spleen, thymus, prostate, 

20 testis, ovary, small intestine, colon and peripheral 
blood lymphocytes. 

The protocols for hybridization of cDNA probes 
to RNA were performed as described (Clontech 
Laboratories, publication PR48380) . Blots were 

25 hybridized with a 2.0kbp 32 P-labeled hBAPl cDNA 

(aa483-729; nucleotides 1486 to 3525) followed by washes 
under standard conditions and detection by 
autoradiography. Blots were also subsequently probed 
with a muscle actin cDNA. 

30 The results indicated that the mRNA encoding 

BAP1 was present as a single mRNA species of about 4 kb 
in all tissues except testis, where a second, about 4.8 
kb mRNA, was also detected. Highest expression was 
detected in testis, placenta and pancreas with varying 
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levels detected in the remaining tissues. Expression of 
BAPl in normal breast tissue was confirmed by RT-PCR of 
total RNA isolated from normal human mammary epithelial 
cells (HUMEC; data not shown) . The level and pattern of 
5 tissue expression shown by BAPl is similar to that shown 
by BRCA1 [Miki et al, cited above]. 

Northern blot analysis was also performed on 
several tumor cell lines representing a variety of tissue 
types. The cell line RNA blot was prepared by standard 

10 methods (Sambrook et al, cited above) with 20 /*g of total 
RNA. Eguivalent loading of RNA was confirmed by ethidium 
bromide staining. Hybridization of cDNA probes to RNA 
were performed using the Clontech protocols. This 
hybridization also showed a single mRNA species. The 

15 colon cell lines HT29 [ATCC HTB 28] and SK-Co-1 [ATCC HTB 
39] showed no BAP^l mRNA, suggesting some defect in the 
BAP-1 gene in these particular cell lines since colon 
tissue shows good expression of BAP-1. 
B. BAPl is a Nuclear Protein 

20 The location of BAPl as a nuclear protein 

within the cell was determined by immunofluorescence 
microscopy performed as previously described [Ishov et 
al, J, Q?ll pjplpqy, 111:815-826 (1996)]. HEP2 
epithelial cells were grown at 37 °C, 5% C0 2 in DMEM 

25 supplemented with 10% fetal bovine serum (FBS) and 2mM 
L-glutaroine , and cells were transfected using DOSPOR 
transfection reagent (Boehringer Mannheim Biochemicals) 
following the manufacturers protocol via electroporation 
with the pcDNA3 vector (Invitrogen, Inc.) carrying the 

30 BAPl CDNA. 

Transfectants were analyzed by 
immunofluorescence staining with anti-BAPl polyclonal 
antibodies, which in turn, were detected with FITC using 
biotin-avidin enhancement. Cells were stained for DNA 
35 with bis-benzimide (Hoechst 33258, Sigma Chemical Co.) 
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and mounted using Fluoromount G (Fisher Scientific) . 
Analysis was performed with a confocal scanning 
microscope (Leica, Inc.)* 

Detection of BAP1 by confocal microscopy 
5 located BAP1 almost exclusively in the nucleus of the 

cell consistent with its association with BRCA1 , and the 
presence of two nuclear localization signals in the BAP1 
protein sequence. 

C. BAP1 is Located on Chromosome 3p21.3 and is 

10 Mutated in Non-Small Cell Lung Carcinoma. 

To determine whether BAP1 was located at a 
chromosomal region routinely mutated in breast cancer and 
thus may be a tumor suppressor gene, the deletion of 
which plays a critical role in tumor pathogenesis, 

15 full-length BAP-1 cDNA was used in fluorescent in situ 
hybridization (FISH) of partial metaphases. FISH was 
performed as described previously [Tommerup and Vissing, 
Genomics . 22:259-264 (1995)] using a biotin-labelled 3.5 
kb cDNA (full-length) clone of BAP-1, with corresponding 

20 DAPI-stained chromosome banding. Localization of BAP1 

was based on the DAPI-band pattern and measurement of the 
relative distance from the short arm telomere to the 
signals (FLpter value) . 

BAP1 maps to chromosome 3p21.3. Specific 

25 signals were observed only on the midportion of the short 
arm of chromosome 3 with 42 of 69 analyzed metaphase 
spreads showing at least one specific signal. The FLpter 
value was 0.27 + 0.02, corresponding to a localization 
for BAP1 at 3p21.2-p21.31. This location is a region of 

30 L0H for breast cancer as well as a region frequently 
deleted in lung carcinomas [Buchhagen et al, int. J. 
Cancer . 57:473-479 (1994); Thiberville et al, Int. J. 
Cancer . 64:371-377 (1995)]. 
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Example 8 - Mutationa l Analysis of BAP1 

The chromosomal location of BAP1 suggested the 
possibility of mutations within BAP1 in lung and breast 
tumors. Thus, a variety of tumor cell lines were 
5 screened for mutations within the BAP1 gene by Southern, 
Northern and PCR-based SSCP analyses. 

A. RNA/DNA Preparation 

Genomic DNA from a panel of small cell lung 
cancer (SCLC) , non-small cell lung cancer (NSCLC) , breast 

10 cancer, and lymphoblastoid cell lines was prepared using 
standard methods. All cell lines were identified by 
their NCI number [Phelps et al, J, Cell. Biochem. Supp l., 
21:32-91 (1996)]: H727, H1466, H226, H526, H841, H1045, 
H289, BL1672, BL1770, H289, H847, H920, H1450, H1573, 

15 H1155, H1299, H1693. Total RNA was extracted by the 

cesium chloride ultracentrifugation method [Ausubel e t 
al, Current Protocols in Molecular Biology, J. Kaaren 
ed., John Wiley & Sons, Inc. (1987)]. First strand cDNAs 
were synthesized from RNA by M-MLV reverse transcriptase 

20 (Gibco BRL) according to the manufacturer's instructions. . 

B. Single Strand Conformational Polymorphism 
(SSCP) Analysis 

Seventeen overlapping PCR primer pairs, each 
with a predicted product size of approximately 200 base 

25 pairs, were designed to span the 2.2 kb open reading 
frame of the BAP1 cDNA sequence. cDNA (from RNA) was 
amplified in 20 /til PCR reactions containing 20mM Tris HC1 
(pH 8.3) , 50mM KC1, 1.5 mM MgCl, 0.2 mM each dNTP, 0.1 mM 
each forward and reverse primer, 0-05 ml 32P-a dCTP, and 

30 0.5 units Taq DNA Polymerase (BRL). PCR reactions were 
carried out in a Perkin-Elmer 9600 Thermocycler using a 
touchdown technique: a 2.5 minute initial denaturation at 
94 *C was followed by 35 cycles of denaturation at 94 °C x 
30s r annealing, initially at 65°C decreasing by 1°C for 

35 each of the first ten cycles to 55 °C, x 30s, and 
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extension at 72 °C x 30s with a final extension of 5 
minutes at 72 °C. PCR products were then diluted 1:10 
with SSCP dye (95% formamide, 20mM EDTA, and 0.05% each 
of bromophenol blue and xylene cyanol) , heat denatured, 
5 and electrophoresed on 0.5X MDE gels +/- 10% glycerol. 

Abnormal single stranded DNA detected as autoradiographic 
shifts were reamplified by PCR and subjected to automated 
dye-terminator sequencing (ABI 373) . 

SSCP analysis showed a homozygous shift in H1466 

10 detected by RT-PCR amplification spanning nts 1089 to 
1286 (primers: sense 5 '-CAACCCCACTCCCATTGTC-3" [SEQ ID 
NO: 46]; antisense 5 1 -GAGTTGGTGTTCTGCACGTC-3" [SEQ ID NO: 
47]). Automated sequencing revealed a homozygous 8 base 
pair frameshift deletion in the NCI-H1466 cDNA, predicted 

15 to encode a truncated 393 amino acid BAP1 protein- This 
homozygous deletion was confirmed to be present in 
genomic DNA from the same cell line. In the NCI-H226 
line, only the 2.4 kb band and an aberrant 2.6 kb band 
were detected. 

20 B. Northern Analysis 

These cell lines were subjected to Northern 
blot analysis and EcoRI digestion and then hybridized to 
a full-length BAP1 cDNA probe. A single 23 kb band was 
detected in the lymphoblastoid and most tumor cell lines 

25 (data not shown) . One NSCLC line, NCI-H226, did not show 
the 23 kb band but did show an aberrant 3 0 kb band (data 
not shown) . 

Further mutational analysis was performed by 
screening a panel of lung cancer and lymphoblastoid cell 

30 lines for expression of BAP1 mRNA. Northern blot 

hybridization showed that most cell lines expressed a 
single 4 kb mRNA. A fainter (5.0 kb) band was visible 
corresponding to cross-hybridization with the 28S 
ribosomal component. However, two cell lines, NCI-H226 

35 and the non-small cell lung cancer NCI-H1466 (both 
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NSCLCs) , showed undetectable levels of BAP1 expression, 
suggesting that BAP1 may play a critical role in NSCLC 
pathogenesis . 

C. Southern Analysis 
5 To further characterize this potential genomic 

rearrangement, genomic DNA from NCI-H226 and a smaller 
number of lung cancer and lymphoblastoid lines were 
subjected to Southern blot hybridization. Briefly, five 
Mg of genomic DNA was subjected to restriction enzyme 

10 digestion with BamHI. Using the full-length BAP1 cDNA 

probe, four distinct bands at 7.5 kb, 4.0 kb, 3.0 kb, and 
2.4 kb were detected which were present in all cell lines 
tested with the exception of NCI-H226. The non-small 
cell lung cancer NCI-H226 line shows an absence of the 

15 7.5kb, 4.0kb, and 3.0kb bands. An aberrant 2.6kb band is 
detected in the NCI-H226 cell line. 

These data clearly show that genetic alterations, 
including intragenic homozygous deletions , occur in BAP1. 

EXAMPLE 9: BAPl Augments the Growth Suppressive Activity 

20 of BRCA1 

To determine whether BAPl may affect cell growth 
itself or may affect BRCAl-mediated changes in cell 
growth, BRCA1 and BAPl cDNAs were co-trans fected into 
MCF7 breast cancer cells. This cell line was chosen for 

25 several reasons. It has been previously shown that these 
cells are inhibited by the overexpression of BRCA1 [Holt 
et al, cited above]. Both northern and RT/PCR analyses 
showed that BAPl was expressed in this cell line (data 
not shown) ; and analysis of the open reading frame from 

30 BAPl cDNA prepared from this cell line showed no 
mutations (data not shown) . 

MCF7 cells grown at 37'C, 5% C02 in DMEM 
supplemented with 10% FBS and non-essential amino acids, 
were transfected with the following plasmid pairs: 
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(a) empty plasmids pcDNA3 and pCMV5; (b) pcDNA3 and 
pCMVS-BAPl; (C) pcDNA3 and pCMV5-BAPl (165-729 of SEQ ID 
NO: 2); (d) pcDNA3-BRCAl and pCMV5? (e) pcDNA3-BRCAl and 
pCMV5-BAPl; (f) pcDNA3-BRCAl and pCMV5-BAPl (165-729 of 
5 SEQ ID NO: 2); (g) pcDNA3-BRCAl-All and pCMV5; (h) 

pcDNA3-BRCAl-All and pCMVS-BAPl; and (i) pcDNA3-BRCAl-All 
and pCMV5-BAPl( 165-729 of SEQ ID NO: 2) by a modified 
CaP0 4 -DNA precipitation method [Holt et al, cited above]. 
MCF7 cells, at 2X10 6 cells/10 cm dish, were fed 

10 fresh medium approximately 3 hours prior to trans feet ion 
and were then treated with the Ca-DNA precipitate for 4 
hours. The cells were subjected to a brief shock with 
transfection buffer containing 15% glycerol. Twelve to 
sixteen hours later, the cells were trypsinized, counted 

15 and plated directly into complete medium containing 0.75 
mg/mL G418 at 5X10 5 cells per 10 cm dish. Cells were fed 
fresh medium containing G418 every three to four days. 
Cells were stained for colonies approximately 21 to 28 
days after transfection. The experiment was repeated 4 

20 times with similar results. 

The expression of BRCA1 alone (pcDNA3-BRCAl:pCMV5) 
decreased the number of colonies formed by these cells 
when compared to the empty vector control (pcDNA3:pCMV5) , 
in agreement with other studies [Holt et al, cited 

25 above] . The co-expression of BRCA1 and BAP1 (pcDNA3- 

BRCAl:pCMV5-BAPl) significantly decreased the number of 
cell colonies (approximately 4 fold vs. BRCA1 alone) 
indicating that BAP1 enhances the growth suppressive 
actions of BRCA1 . A mutant of BAP1, BAP1 (AA165-729) , in 

30 which the enzymatic region is deleted but which still 

binds to BRCA1 (data not shown) , also enhanced the growth 
suppression of BRCA1, but not to the same extent as the 
wildtype BAP1. 

In contrast to BRCA1, the expression of BRCAI-aII 

35 (BRCA1 missing the 11th exon) in MCF7 cells by itself had 
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no effect on the growth of MCF7 cells. However, the 
co-expression of BRCAI-aII and BAP1 significantly 
decreased the number of colonies , suggesting that the 
presence of BAP1 could functionally substitute for the 
5 missing 11th exon of BRCA1 and/or that BAP1 itself was an 
inhibitor of cell growth. 

In support of this latter hypothesis, the expression 
of BAP1 in MCF7 cells did somewhat reduce the number of 
colonies formed (pcDNA3:pCMV5-BAPl) . The expression of 
10 the enzymatic mutant, BAP1 (165-729) f alone or in 

combination with BRCAI-aII yielded the same number of 
colonies. Thus, enzymatically active BAP1 enhances 
BRCAl-mediated suppression of growth. 

Example 10 - BAP1 Enzvmatic Assay 

15 To determine whether BAP1 did indeed have UCH 

activity, the BAP1 cDNA was expressed in bacteria and 
this protein was assayed for the ability to hydrolyze the 
glycine 76 ethyl ester of ubiguitin [Ub-OEt; Mayer et al, 
Biochemistry, 28:166-172 (1989)], 

20 Briefly, bacteria (E . coli DH5a) harboring an 

IPTG- inducible expression plasmid containing BAP1 or an 
enzymatically null mutant, BAP1 (C91 S) (pQE-30? QIAGEN 
Inc.) were grown and induced with 1 mM IPTG for 4 hours. 
The bacteria were collected and the pellets were 

25 resuspended to 1/20 volume (original culture) in lysate 
buffer (50mM Tris, pH 8.0, 25mM EDTA, IOtoM 
2-mercapto-ethanol, 100 Mg/ ml lysozyme) . The lysates 
were sonicated and centrifuged at 40,000 Xg. 

The pellets were resuspended in an volume egual to 

30 that of the supernatant and samples of both pellet and 
supernatant were analyzed by SDS-PAGE for expression 
levels and inclusion body formation. Induction of 
protein was verified by SDS-PAGE of each fraction* 
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Overexpression of BAP1 in bacteria led to abundant 
protein, most of which was found in an inactive, 
insoluble form. 

Assays for BAP1 enzymatic activity, specifically, 
5 ubiquitin carboxy-terminal hydrolase activity, were 
performed on the above-described soluble fraction 
essentially as described for the UCH-L1 and UCH-L3 
enzymes using the glycine 76 ethyl ester of ubiquitin 
(Ub-OEt) as a substrate [Mayer et al, cited above; 
10 Wilkinson et al, Biochemistry . 25:6644-6649 (1986)]. 
Assays were done in triplicate. The peak areas were 
integrated and normalized with respect to a ubiquitin 
standard. 

The BAP1 protein found in the soluble fraction was 

15 able to hydrolyze UbOEt and the level of this activity 

increased with the level of protein, indicating that BAP1 
contains UCH-like enzymatic activity. 

The active site thiol residue responsible for UCH 
activity in UCH-L3 has been identified and its mutation 

20 leads to abolition of enzyme activity [Larsen et al, 
cited above] . Mutation of the corresponding cysteine 
residue in BAP1, BAP1 (C91 S) , yielded a protein with no 
UCH activity, further suggesting that BAP1 is a thiol 
protease of the UCH family. 

25 BAPl's identity as a protease of the ubiquitin 

carboxy-terminal hydrolase (UCH) family implies a role 
for either ubiquitin-mediated, proteasome dependent 
degradation or other ubiquitin-mediated regulatory 
[Isaksson et al, Biochimica et Biophvsica Acta . 1288 :F21- 

30 29 (1996) ] pathways in BRCA1 function. Regulated 
ubiquitination of proteins and subsequent 
proteasome-dependent proteolysis plays a role in almost 
every cellular growth, differentiation and homeostatic 
process [reviewed by Ciechanover, Biol. Chem. Hoppe- 

35 Sevier . 375 :565-581 (1994); Isaksson et al, cited above; 
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Wilkinson, Annual Revi ew of Nutrition. 1£: 161-189 
(1995) ]• This pathway can be broadly subdivided into 
reactions involving 1 ) pro-ubiquitin processing and 
ATP-dependent activation of ubiquitin; 2) substrate 
5 recognition, conjugation and editing of the polyubiquitin 
chain? 3) proteasome-dependent degradation of the 
ubiquitin protein and; 4) cleavage and/or debranching of 
peptide-ubiquitin conjugates and recycling of ubiquitin 
to cellular pools* The pathway is regulated at almost 

10 every step. First, at the level of substrate specificity 
via the concerted actions of activating enzymes, carrier 
proteins and ligation enzymes , and secondly, at the level 
of proteolytic deubiquitination and ubiquitin hydrolysis. 
The UCH family has been characterized as a set of 

15 small (25-30 kDa) cytoplasmic proteins which prefer to 
cleave ubiquitin from ubiquit in-conjugated small 
substrates and may also be involved in the 
co-translational processing of proubiguitin. UCHs show 
considerable tissue specificity and developmental ly-timed 

20 regulation [Wilkinson et al, Biochem. Soc. Trans. . 

20:631-637 (1992)]. UCH family members are strongly and 
differentially expressed in neuronal, hematopoietic and 
germ cells in many species. Most remarkably, a novel UCH 
enzyme has recently been cloned from Aplysia californica 

25 whose enzymatic function is essential for acquisition and 
maintenance of long-term memory [Hedge et al, Cell , 
fil: 114-126 (1997)]. Finally, UCH levels are strongly 
downregulated during viral transformation of fibroblasts 
[Honore et al, FEBS Letter , 280 2 235-240 (1991)], 

30 consistent with a role in growth control. 

BAP1 is the newest member of the UCH family and 
considerably expands the potential roles of this family 
of proteases. BAP1 is a much larger protein (90 kDa) and 
is the first nuclear-localized UCH. BAP1 is also likely 

35 to be involved in the regulation of protein subcellular 
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localization. Ubiquitin, or a ubiquit in-like moiety, may 
affect the specific targeting of proteins to locations 
other than the proteasome [see, e.g., Mahajan et al, 
Cell . 8&: 97-107 (1997)]. BAP1 -mediated removal of 
5 "ubiquitin" from BRCAl, or a protein associated with 
BRCAl, could target it for removal to another cellular 
compartment, thus functionally destroying the protein 
without physically doing so. 

BRCAl is also localized in nuclear dot structures in 

10 a cell-cycle dependent manner [Scully et al, cited 

above] . This association of BRCAl with RAD51 in both 
mitotic and meiotic cells broadly implicates BRCAl in DMA 
repair and/ or recombination processes. The 
RAD51/52-dependent DNA repair pathway is highly regulated 

15 and includes many proteins, some of which may be 
potential substrates for BAPl-mediated ubiquitin 
hydrolysis [Watkins et al, Molecular & Cellular Biology . 
13:7757-7765 (1993)]. Thus, it appears that the DNA 
repair machinery contains both ubiquit in-conjugating and 

20 -hydrolyzing elements, since BAP1 is now implicated as a 
member of the BRCAl/RAD51/hUBC9 complex. It is possible 
that BAP1, which is co-expressed with BRCAl in testis, 
may regulate the recombination/repair functions of the 
BRCA1/RAD52 complex by targeting either RAD23 or UBL1 for 

25 ubiquitin hydrolysis. 

Numerous modifications and variations of the present 
invention are included in the above-identified 
specification and are expected to be obvious to one of 
skill in the art. Such modifications and alterations to 

30 the compositions and processes of the present invention 

are believed to be encompassed in the scope of the claims 
appended hereto. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION 
(I) APPLICANT: 



<vi) 



<v11> 



Cvi i > 



(vfil) 



<ix> 



Uistar Institute of Anatomy &, Biology 
Rauscher III, Frank J. 
Jensen, David E. 



TITLE OF INVENTION: 
Uses Therefor 



(Hi) NUMBER OF SEQUENCES: 



BRCA1 Associated 



47 



Protein 



(BAP-1) and 



<iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Howson and Hows on 

(B) STREET: P.O. Box 457, 321 Norristown 

(C) CITY: Spring House 

(D) STATE : Pennsylvania 

(E) COUNTRY: U.S.A. 
<F> ZIP: 19477 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

<C> OPERATING SYSTEM: PC -DOS/ MS -DOS 
(D) SOFTWARE: Patentln Release #1.0, 



Road 



Version #1.30 



CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: UO 

(B) FILING DATE: 

(C) CLASSIFICATION: 



PRIOR APPLICATION 

(A) APPLICATION 

(B) FILING DATE: 

PRIOR APPLICATION 

(A) APPLICATION 

(B) FILING DATE: 



DATA: 

NUMBER: US 60/022,997 
02- AUG- 1996 

DATA: 

NUMBER: US 60/038,109 
19-FEB-1997 



AT TORNEY /AGENT I N FORMAT I ON : 

(A) NAME: Bak, Mary E. 

(B) REGISTRATION NUMBER: 31,215 

(C) RE FERENCE /DOCKET NUMBER: WST68BPCT 



TELECOMMUNICATION 
(A) TELEPHONE: 
<B> TELEFAX: 



INFORMATION: 
215-540-9200 
215-540-5818 



(2) INFORMATION 



FOR SEQ ID NO:1: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3517 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: not relevant 



<ii) 



MOLECULE TYPE: cONA 
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(ix) FEATURE: 

(A) NAME/KEY : CDS 

(B) LOCATION: 40. .2226 

(Hi) SEQUENCE DESCRIPTION: SEO ID NO:1: 

CCCACGAGGC ATG6CGCTQA GGGGCCGCCC CGCGGGAA6 ATG AAT AAG GGC TGG 54 

Met Asn Lys Gly Trp 
1 5 

CTG GAG CTG GAG AGC GAC CCA GGC CTC TTC ACC CTG CTC GTG GAA GAT 102 

Leu Glu Leu Glu Ser Asp Pro Gly Leu Phe Thr Leu Leu Val Glu Asp 
10 15 20 

TTC GGT GTC AAG GGG GTG CAA GTG GAG GAG ATC TAC GAC CTT CAG AGC 150 

Phe Gly Val Lys Gly Val Gin Val Glu Glu lie Tyr Asp Leu Gin Ser 

25 30 35 

AAA TGT CAG GGC CCT GTA TAT GGA TTT ATC TTC CTG TTC AAA TGG ATC 198 

Lys Cys Gin Gly Pro Val Tyr Gly Phe lie Phe Leu Phe Lys Trp lie 

40 45 50 

GAA GAG CGC CGG TCC CGG CGA AAG GTC TCT ACC TTG GTG GAT GAT ACG 246 

Glu Glu Arg Arg Ser Arg Arg Lys Val Ser Thr Leu Val Asp Asp Thr 

55 60 65 

TCC GTG ATT GAT GAT GAT ATT GTG AAT AAC ATG TTC TTT GCC CAC CAG 294 

Ser Val lie Asp Asp Asp lie Val Asn Asn Met Phe Phe Ala His Gin 
70 75 80 85 

CTG ATA CCC AAC TCT TGT GCA ACT CAT GCC TTG CTG AGC GTG CTC CTG 342 

Leu lie Pro Asn Ser Cys Ala Thr His Ala Leu Leu Ser Val Leu Leu 
90 95 100 

AAC TGC AGC AGC GTG GAC CTG GGA CCC ACC CTG AGT CGC ATG AAG GAC 390 

Asn Cys Ser Ser Val Asp Leu Gly Pro Thr Leu Ser Arg Met Lys Asp 

105 110 115 

TTC ACC AAG GGT TTC AGC CCT GAG AGC AAA GGA TAT GCG ATT GGC AAT 438 

Phe Thr Lys Gly Phe Ser Pro Glu Ser Lys Gly Tyr Ala He Gly Asn 

120 125 130 

GCC CCG GAG TTG GCC AAG GCC CAT AAT AGC CAT GCC AGG CCC GAG CCA 486 

Ala Pro Glu Leu Ala Lys Ala His Asn Ser His Ala Arg Pro Glu Pro 

135 140 145 

CGC CAC CTC CCT GAG AAG CAG AAT GGC CTT AGT GCA GTG CGG ACC ATG 534 

Arg His Leu Pro Glu Lys Gin Asn Gly Leu Ser Ala Val Arg Thr Met 

130 155 160 165 

GAG GCG TTC CAC TTT GTC AGC TAT GTG CCT ATC ACA GCC CGG CTC TTT 582 

Glu Ala Phe His Phe Val Ser Tyr Val Pro lie Thr Gly Arg Leu Phe 

170 175 180 

GAG CTG GAT GGG CTG AAG GTC TAC CCC ATT GAC CAT GGG CCC TGG GGG 630 

Glu Leu Asp Gly Leu Lys Val Tyr Pro lie Asp His Gly Pro Trp Gly 

185 190 195 

GAG GAC GAG GAG TGG ACA GAC AAG GCC CGG CGG GTC ATC ATG GAG CGT 678 

Glu Asp Glu Glu Trp Thr Asp Lys Ala Arg Arg Val He Met Glu Arg 

200 205 210 
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ATC GGC CTC GCC ACT GCA CCG GAG CCC TAC CAC GAC ATC CGC TTC AAC - 726 

lie Gly Leu Ala Thr Ale Gly Glu Pro Tyr His Asp He Arg Phe Asn 

215 220 225 

CTG ATG GCA GTG GTG CCC GAC CGC AGG ATC AAG TAT GAG GCC AGG CTG 774 

Leu Net Ale Val Val Pro Asp Arg Arg lie Lys Tyr Glu Ala Arg Leu 

230 235 240 245 

CAT GTG CTG AAG GTG AAC CGT CAG ACA GTA CTA GAG GCT CTG CAG CAG 822 

His Vel Leu Lys Val Asn Arg Gin Thr Val Leu Glu Ala Leu Gin Gin 

250 255 260 

CTG ATA AGA GTA ACA CAG CCA GAG CTG ATT CAG ACC CAC AAG TCT CAA 870 

Leu lie Arg Val Thr Gin Pro Glu Leu He Gin Thr His Lys Ser Gin 

265 270 275 

GAG TCA CAG CTG CCT GAG GAG TCC AAG TCA GCC AGC AAC AAG TCC CCG 918 

Glu Ser Gin Leu Pro Glu Glu Ser Lys Ser Ala Ser Asn Lys Ser Pro 

280 285 290 

CTG GTG CTG GAA GCA AAC AGG GCC CCT GCA GCC TCT GAG GGC AAC CAC 966 

Leu Val Leu Glu Ala Asn Arg Ala Pro Ala Ala Ser Glu Gly Asn His 

295 300 305 

ACA GAT GGT GCA GAG GAG GCG GCT GGT TCA TGC GCA CAA GCC CCA TCC 1014 

Thr Asp Gly Ala Glu Glu Ala Ala Gly Ser Cys Ala Gin Ala Pro Ser 

310 315 320 325 

CAC AGC CCT CCC AAC AAA CCC AAG CTA GTG GTG AAG CCT CCA GGC AGC 1062 

His Ser Pro Pro Asn Lys Pro Lys Leu Val Val Lys Pro Pro Gly Ser 

330 335 340 

AGC CTC AAT GGG GTT CAC CCC AAC CCC ACT CCC ATT GTC CAG CGG CTG 1110 

Ser Leu Asn Gly Val His Pro Asn Pro Thr Pro He Val Gin Arg Leu 

345 350 355 

CCG GCC TTT CTA GAC AAT CAC AAT TAT GCC AAG TCC CCC ATG CAG GAG 1158 

Pro Ala Phe Leu Asp Asn His Asn Tyr Ala Lys Ser Pro Net Gin Glu 

360 365 370 

GAA GAA GAC CTG GCG GCA GGT GTG GGC- CGC AGC CGA GTT CCA GTC CGC 1206 

Glu Glu Asp Leu Ala Ala Gly Val Gly Arg Ser Arg Val Pro Val Arg 

375 380 385 

CCA CCC CAG CAG TAC TCA GAT GAT GAG GAT GAC TAT GAG GAT GAC GAG 1254 

Pro Pro Gin Gin Tyr Ser Asp Asp Glu Asp Asp Tyr Glu Asp Asp Glu 

390 395 400 405 

GAG GAT GAC GTG CAG AAC ACC AAC TCT GCC CTT AGG TAT AAG GGG AAG 1302 

Glu Asp Asp Val Gin Asn Thr Asn Ser Ala Leu Arg Tyr Lys Gly Lys 

410 415 420 

GGA ACA GGG AAG CCA GGG GCA TTG AGC GGT TCT GCT GAT GGG CAA CTG 1350 

Gly Thr Gly Lys Pro Gly Ala Leu Ser Gly Ser Ala Asp Gly Gin Leu 

425 430 435 

TCA GTG CTG CAG CCC AAC ACC ATC AAC GTC TTG GCT GAG AAG CTC AAA 1398 

Ser Val Leu Gin Pro Asn Thr He Asn Val Leu Ala Glu Lys Leu Lys 

440 445 450 

GAG TCC CAG AAG GAC CTC TCA ATT CCT CTG TCC ATC AAG ACT AGC AGC 1446 

Glu Ser Gin Lys Asp Leu Ser He Pro Leu Ser He Lys Thr Ser Ser 

455 460 465 
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6GG CCT GGG AGT CCG GCT GTG GCA GTG CCC ACA CAC TCG CAG CCC TCA 1494 

Gly Ala Gly Ser Pro Ala Val Ala Vat Pro Thr His Ser Gin Pro Ser 

470 475 480 485 

CCC ACC CCC AGC AAT GAG AGT ACA GAC ACG GCC TCT GAG ATC GGC AGT 1542 

Pro Thr Pro Ser Asn Glu Ser Thr Asp Thr Ala Ser Glu He Gly Ser 

490 495 500 

GCT TTC AAC TCG CCA CTG CGC TCG CCT ATC CGC TCA GCC AAC CCG ACG 1590 

Ala Phe Asn Ser Pro leu Arg Ser Pro lie Arg Ser Ala Asn Pro Thr 

505 510 515 

CGG CCC TCC AGC CCT GTC ACC TCC CAC ATC TCC AAG CTG CTT TTT GGA 1638 

Arg Pro Ser Ser Pro Val Thr Ser His He Ser Lys Val Leu Phe Gly 

520 525 530 

GAG GAT GAC AGC CTG CTG CGT GTT GAC TGC ATA CGC TAC AAC CGT GCT 1686 

Glu Asp Asp Ser Leu Leu Arg Val Asp Cys Me Arg Tyr Asn Arg Ala 

535 540 545 

GTC CGT GAT CTG GGT CCT GTC ATC AGC ACA GGC CTG CTG CAC CTG GCT 1734 

Val Arg Asp Leu Gly Pro Val Me Ser Thr Gly Leu Leu His Leu Ala 

550 555 560 565 

GAG GAT GGG GTG CTG AGT CCC CTG GCG CTG ACA GAG GGT GGG AAG GGT 1782 

Glu Asp Gly Val Leu Ser Pro Leu Ala Leu Thr Glu Gly Gly Lys Gly 

570 575 580 

TCC TCG CCC TCC ATC AGA CCA ATC CAA GGC AGC CAG GGG TCC AGC AGC 1830 

Ser Ser Pro Ser Me Arg Pro Me Gin Gly Ser Gin Gly Ser Ser Ser 

585 590 595 

CCA GTG GAG AAG GAG GTC GTG GAA GCC ACG GAC AGC AGA GAG AAG ACG 1878 

Pro Val Glu Lys Glu Val Val Glu Ala Thr Asp Ser Arg Glu Lys Thr 

600 605 610 

GGG ATG GTG AGG CCT GGC GAG CCC TTG AGT GGG GAG AAA TAC TCA CCC 1926 

Gly Met Val Arg Pro Gly Glu Pro Leu Ser Gly Glu Lys Tyr Ser Pro 

615 620 625 

AAG GAG CTG CTG GCA CTG CTG AAG TGT GTG GAG GCT GAG ATT GCA AAC 1974 

Lys Glu Leu Leu Ala Leu Leu Lys Cys Val Glu Ala Glu Me Ala Asn 

630 635 640 645 

TAT GAG GCG TGC CTC AAG GAG GAG GTA GAG AAG AGG AAG AAG TTC AAG 2022 

Tyr Glu Ala Cys Leu Lys Glu Glu Val Glu Lys Arg Lys Lys Phe Lys 

650 655 660 

ATT GAT GAC CAG AGA AGG ACC CAC AAC TAC GAT GAG TTC ATC TGC ACC 2070 

He Asp Asp Gin Arg Arg Thr His Asn Tyr Asp Glu Phe Me Cys Thr 

665 670 675 

TTT ATC TCC ATG CTG GCT CAG GAA GGC ATG CTG GCC AAC CTA GTG GAG 2118 

Phe Me Ser Net Leu Ala Gin Glu Gly Net Leu Ala Asn Leu Val Glu 

680 665 690 

CAG AAC ATC TCC GTG CGG CGG CGC CAA GGG GTC AGC ATC GGC CGG CTC 2166 

Gin Asn Me Ser Val Arg Arg Arg Gin Gly Val Ser Me Gly Arg Leu 

695 700 705 

CAC AAG CAG CGG AAG CCT GAC CGG CGG AAA CGC TCT CGC CCC TAC AAG 2214 

His Lys Gin Arg Lys Pro Asp Arg Arg Lys Arg Ser Arg Pro Tyr Lys 

710 715 720 725 
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GCC AAG CGC 
Ala Lys Arg 

TGTGGCCCTC 

GTCCCAGCTG 

GTGCCCTGAG 

TTGGGGCACA 

GCCCTTCTGC 

GCTCCAACCC 

TAGGAGCTGT 

TAGCTGGGGG 

GCCTGTCCCT 

AACTCAGGGA 

AGCAGGCTGC 

GAACTGTCTT 

GGCTCTTCGC 

GTCCCTGGCG 

CCTCTCAGGG 

TGCTACTCCA 

ACAACCCGTT 

AGTGCCTCTG 

GGTGATCCAA 

TGGCACTCCT 

GAATGAATAA 



CAG TGAGGACTGC 
Gin 



TGGCCCTGAC 



TCTGCAGCCC 



ACTCTTGCCG 



ACCAGGGTCC 
GAGAGTCCAG 
GCCTGACACG 
GCGAGGTACT 
CTGGGCAGCA 
AACATGCCAC 
CCTGGTGGGC 
CCTGGGTGGG 
GTTGTAAGGA 
CCCAGCACTG 
TGGGATCCCA 
CCTTGTTCTA 
CTTCAGTGTT 
CTTGAGGCTC 
GTAGCAGAGA 
GTTTCCTCAG 
GGAGCCCCTG 
GAGCCCAGGC 
CAGGCCCCTT 
CTGGGCTGAG 
AACTCTCCTA 



TTCCCTGCCC 
GCCCTGGGAA 
GCAGATCAGC 
GCAGCTTCCT 
GAATATATAT 
CATGTTGACA 
CCAGGTCCTT 
CCCTGGGCTC 
AGCCAGGTCT 
GGCTGGGTTG 
TGGCCTGAGC 
GCCAGGCTGT 
GTGGCCCTAG 
AGAAGAGCCT 
CAGGGTTGCT 
CCTCTGCAAG 
TGTTCCAGAG 
CCCAACACAG 
TATCTGTACA 
CACAGCTTGA 
AGATCTCCTG 



CACTTCCCCT 
TGGGAGGAAC 
CCCATAGTGC 
CCACAGCCGG 
TTTACCTATC 
7AAGTTCCTA 
GTATCATCCA 
TGGGCCCTGC 
TCTCTCTTCA 
GGAGTAGGGT 
AGAGCATGTG 
TCAAGACTGC 
CTATGGGCCT 
CTGTCCAGCC 
TATAGGAAGC 
GCACTCAGGG 
GACCTGATGC 
CCCCATGGCC 
TAGTGACTGA 
CCCCTCTAGC 
AGAAAAAAAA 



TTTCCCAGTA 
CAGGCCACAT 
TCAGGAGGCA 
CTGTGGAGCA 
AGAGACATCT 
CCTGACTATG 
CGGTCCCAAC 
TGCTCTAGCC 
TTCCTCTTAG 
GTCCCAGTGG 
GGAACTGTTC 
TCTCCATAGC 
AAATTGGGCT 
CCTCAGTATT 
TGGCACCACT 
TGGGGGACAG 
CAAGGGGTAA 
TCTCCAGATG 
GTGGGGGGTG 
CCCTGTAAAA 
AAAAAAAAAG 



TTACTGAATA 
TCCTTCCATC 
GCATCTGGAG 
GCAGGACCTG 
ATTTTTCTGG 
CTTTCTCTCC 
TACAGGGTCC 
CCAGCCACCA 
GAGAGTGCCA 
GGTTGGGGTG 
AGTGGCCTGT 
AAGGTTCTAG 
CTAGGTCTCT 
* ACCATGTCTC 
CAGCTCTTCC 
CAGGATCAAG 
TGGGCCCAGC 
GCTTTGAAAA 
CTGGCAACTG 
CTGGATCAAT 
G 



2266 

2326 
2366 
2446 
2506 
2566 
2626 
2686 
2746 
2806 
2866 
2926 
2986 
3046 
3106 
3166 
3226 
3286 
3346 
3406 
3466 
3517 



<2> INFORMATION 



FOR SEO ID N0:2: 



(f) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 729 amino acids 

(B) TYPE: amino acid 
(0) TOPOLOGY: linear 

(if) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

Met Asn Lys Gly Trp Leu Glu Leu Glu Ser Asp Pro Gly Leu Phe Thr 
15 10 15 

leu Leu Val Glu Asp Phe Gly Val Lys Gly Val Gin Val Glu Glu He 
20 25 30 
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Tyr Asp Leu Gin 
35 

Leu Phe Lys Trp 
50 

Leu Val Asp Asp 
65 

Phe Phe Ala His 



Leu Ser Val Leu 
100 

Ser Arg Met Lys 
115 

Tyr Ala lie Gly 
130 



Ser Lys Cys 

tie Glu Glu 
55 

Thr Ser Val 
70 

Gin Leu lie 
85 

Leu Asn Cys 

Asp Phe Thr 



Asn Ala Pro 
135 



Gin Gly 
40 



57 

Pro Val 



Arg Arg Ser Arg 
lie Asp 



Pro Asn 



Ser Ser 
105 

Lys Gly 
120 



Asp Asp 

75 

Ser Cys 
90 

Val Asp 



Phe Ser 



Glu Leu Ala Lys 



Tyr Gly Phe 
45 

Arg Lys Val 
60 

] Le Val Asn 

Ala Thr His 



Leu Gly Pro 
110 

Pro Glu Ser 
125 

Ala His Asn 
140 



He Phe 



Ser Thr 



Asn Met 
80 

Ala Leu 
95 

Thr Leu 



Lys Gly 



Ser His 



Ala Arg Pro Glu 
145 

Ala Val Arg Thr 



Thr Gly Arg Leu 
180 

His Gly Pro Trp 
195 

Val lie Net Glu 
210 

Asp I le Arg Phe 
225 

Tyr Glu Ala Arg 



Glu Ala Leu Gin 
260 

Thr His Lys Ser 
275 

Ser Asn Lys Ser 
290 

Ser Glu Gly Asn 
305 

Ala Gin Ale Pro 



Lys Pro Pro Gly 
340 

He Val Gin Arg 
355 



Pro Arg His 
150 

Net Glu Ala 
165 

Phe Glu Leu 



Gly Glu Asp 



Arg He Gly 
215 

Asn Leu Net 
230 

Leu His Val 
245 

Gin Leu lie 



Gin Glu Ser 



Pro Leu Val 
295 

His Thr Asp 
310 

Ser His Ser 
325 

Ser Ser Leu 



Leu Pro Ala 



Leu Pro 



Phe His 



Asp Gly 
185 

Glu Glu 
200 



Glu Lys 
155 

Phe Val 
170 

Leu Lys 

Trp Thr 



Leu Ala Thr Ala 



Ala Val 



Leu Lys 



Arg Val 

265 

Gin Leu 
280 



Val Pro 
235 

Val Asn 
250 

Thr Gin 



Pro Glu 



Leu Glu Ala Asn 
Gly Ala 
Pro Pro 



Glu Glu 
315 



Asn Gly 
345 

Phe Leu 
360 



Asn Lys 
330 

Val His 

Asp Asn 



Gin Asn Gly 

Ser Tyr Val 

Val Tyr Pro 
190 

Asp Lys Ala 
205 

Gly Glu Pro 
220 

Asp Arg Arg 

Arg Gin Thr 

Pro Glu Leu 
270 

G I u Ser Lys 
285 

Arg Ala Pro 
300 

Ala Ala Gly 

Pro Lys Leu 



Pro Asn Pro 
350 

His Asn Tyr 
365 



Leu Ser 
160 

Pro He 
175 

He Asp 

Arg Arg 

Tyr His 



He Lys 
240 

Val Leu 
255 

He Gin 



Ser Ala 



Ala Ala 



Ser Cys 
320 

Val Val 
335 

Thr Pro 



Ala Lys 
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Ser Pro Met Gin Glu Glu Glu Asp Leu Ale Ale Gly Val Gly Arg Ser 
370 375 380 

Arg Val Pro Vel Arg Pro Pro Gin Gin Tyr Ser Asp Asp Glu Asp Asp 
363 390 395 400 

Tyr Glu Asp Asp Glu Glu Asp Asp Vel Gin Asn Thr Asn Ser Ale leu 
405 410 415 

Arg Tyr Lys Gly Lys Gly Thr Gly Lys Pro Gly Ala Leu Ser Gly Ser 
420 425 430 

Ala Asp Gly Gin Leu Ser Val Leu Gin Pro Asn Thr lie Asn Val Leu 
435 440 445 

Ala Glu Lys Leu Lys Glu Ser Gin Lys Asp Leu Ser lie Pro Leu Ser 
450 455 460 

lie Lys Thr Ser Ser Gly Ala Gly Ser Pro Ala Val Ala Val Pro Thr 
465 470 475 480 

Mis Ser Gin Pro Ser Pro Thr Pro Ser Asn Glu Ser Thr Asp Thr Ala 
485 , 400 495 

Ser Glu He Gly Ser Ale Phe Asn Ser Pro Leu Arg Ser Pro He Arg 
500 505 510 

Ser Ala Asn Pro Thr Arg Pro Ser Ser Pro Val Thr Ser His lie Ser 
515 520 525 

Lys Val Leu Phe Gly Glu Asp Asp Ser Leu Leu Arg Val Asp Cys He 
530 535 540 

Arg Tyr Asn Arg Ala Val Arg Asp Leu Gly Pro Val He Ser Thr Gly 
545 550 555 560 

Leu Leu His Leu Ala Glu Asp Gly Val Leu Ser Pro Leu Ala Leu Thr 
565 570 575 

Glu Gly Gly Lys Gly Ser Ser Pro Ser He Arg Pro He Gin Gly Ser 
580 585 590 

Gin Gly Ser Ser Ser Pro Val Glu Lys Glu Val Val Glu Ala Thr Asp 
595 600 605 

Ser Arg Glu Lys Thr Gly Met Val Arg Pro Gly Glu Pro Leu Ser Gly 
610 615 620 

Glu Lys Tyr Ser Pro Lys Glu Leu Leu Ala Leu Leu Lys Cys Val Glu 
625 630 635 640 

Ala Glu He Ala Asn Tyr Glu Ala Cys Leu Lys Glu Glu Val Glu Lys 
645 650 655 

Arg Lys Lys Phe Lys He Asp Asp Gin Arg Arg Thr His Asn Tyr Asp 
660 665 670 

Glu Phe He Cys Thr Phe lie Ser Met Leu Ala Gin Glu Gly Met Leu 
675 680 685 

Ala Asn Leu Val Glu Gin Asn He Ser Val Arg Arg Arg Gin Gly Val 
690 695 700 
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Ser lie Cly Arg Leu His Lys Gin Arg Lys Pro Asp Arg Arg Lys Arg 
705 710 715 720 

Ser Arg Pro Tyr Lys Ala Lys Arg Gin 
725 



(2) INFORMATION FOR SEQ ID NO:3: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 100 amino acids 
(8) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: unknown 

<U) MOLECULE TYPE: protein 

(XI) SEQUENCE DESCRIPTION: SEO ID NO: 3: 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val lie Asn 
15 10 15 

Ala Met Gin Lys He Leu Glu Cys Pro lie Cys Leu Glu Leu He Lys 
20 25 30 

Glu Pro Val Ser Thr Lys Cys Asp His He Phe Cys Lys Phe Cys Met 
35 40 45 

Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 
50 55 60 

Lys Asn Asp He Thr Lys Arg Ser Leu Gin Glu Ser Thr Arg Phe Ser 
65 70 75 80 

Gin Leu Val Glu Glu Leu Leu Lys He He Cys Ala Phe Gin Leu Asp 
85 90 95 

Thr Gly Leu Glu 
100 



(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 100 amino acids 

(B) TYPE: amino acid 

(C) STRAWDEDNESS: 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

Met Asp Leu Ser Ala Val Gin He Gin Glu Val Gin Asn Val Leu His 
1 5 10 15 

Ala Met Gin Lys He Leu Glu Cys Pro He Cys Leu Glu Leu He Lys 
20 25 30 

Glu Pro Val Ser Thr Lys Cys Asp His He Phe Cys Lys Phe Cys Met 
35 40 45 
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Leu Lys Leu Leu Asn Gin lys Lys Cly Pro Ser Gin Cys Pro Leu Cys 
50 55 60 

Lys Asn Glu lie Thr Lys Arg Ser Leu Gin Gly Ser Thr Arg Phe Ser 
65 70 75 80 

Gin Leu Ale Glu Glu Leu Leu Arg He Met Ala Ala Phe Glu Leu Asp 
85 90 95 

Thr Gly Het Gin 
100 

(2) INFORMATION FOR SEO ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 100 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: unknown 

Cii) MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

Met Ala Ser Ser Val Leu Glu Met lie Lys Glu Glu Val Thr Cys Pro 
15 10 15 

He Cys Leu Glu Leu Leu Lys Glu Pro Val Ser Ala Asp Cys Asn His 
20 25 30 

Ser Phe Cys Arg Ala Cya He Thr Leu Asn Tyr Glu Ser Asn Arg Asn 
35 40 45 

Thr Asp Gly Lys Gly Asn Cys Pro Val Cys Arg Val Pro Tyr Pro Phe 
50 55 60 

Gly Asn Leu Arg Pro Asn Leu His Val Ala Asn He Val Glu Arg Leu 
65 70 75 80 

Lys Gly Phe Lys Ser lie Pro Glu Glu Glu Gin Lys Val Asn lie Cys 
85 90 95 

Ala Gin His Gly 
100 

(2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 262 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY : unknown 

<ii) MOLECULE ) TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:6: 

Lys Lys Glu He Trp Asn Ser Asp Pro Arg Gly His Glu Gly Pro Gin 
15 10 15 
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Pro Ser Pro Thr Pro Ser Asn Glu Ser Thr Asp Thr Ala Ser Glu He 
20 25 30 

Gly Ser Ala Phe Asn Ser Pro Leu Arg Ser Pro I to Arg Ser Ala Asn 
35 40 45 

Pro Thr Arg Pro Ser Ser Pro Val Thr Ser Hfs lie Ser Lys Val Leu 
50 55 60 

Phe Gly Glu Asp Asp Ser Leu Leu Arg Val Asp Cys lie Arg Tyr Asn 
65 70 75 80 

Arg Ala Val Arg Asp Leu Gly Pro Val lie Ser Thr Gly Leu Leu Hfs 
85 90 95 

Leu Ala Glu Asp Gly Val Leu Ser Pro Leu Ala Leu Thr Glu Gly Gly 
100 105 110 

Lys Gly Ser Ser Pro Ser tie Arg Pro lie Gin Gly Ser Gin Gly Ser 
115 120 125 

Ser Ser Pro Val Glu Lys Glu Val Val Glu Ala Thr Asp Ser Arg Glu 
130 135 140 

Lys Thr Gly Met Val Arg Ser Gly Glu Pro Leu Ser Gly Glu Lys Tyr 
145 150 155 160 

Ser Pro Lys Glu Leu Leu Ala Leu Leu Lys Cys Val Glu Ala Glu He 
165 170 175 

Ala Asn Tyr Glu Ala Cys Leu Lys Glu Glu Val Glu Lys Arg Lys Lys 
180 185 190 

Phe Lys He Asp Asp Gin Arg Arg Thr His Asn Tyr Asp Glu Phe He 
195 200 205 

Cys Thr Phe lie Ser Net Leu Ala Gin Glu Gly Met Leu Ala Asn Leu 
210 215 220 

Val Glu Gin Asn He Ser Val Arg Arg Arg Gin Gly Val Ser He Gly 
225 230 235 240 

Arg Leu His Lys Gin Arg Lys Pro Asp Arg Arg Lys Arg Ser Arg Pro 
245 250 255 

Tyr Lys Ala Lys Arg Gin 
260 



INFORMATION FOR SEQ ID NO: 7: 

( i ) SEQUENCE CHARACTER I ST I CS : 

(A) LENGTH: 188 amino acids 

(6) TYPE: amino acid 

(C) STRAND EDNESS: 

(0) TOPOLOGY: unknown 

<ii) MOLECULE TYPE: protein 

<xi> SEQUENCE DESCRIPTION: SEO ID MO: 7: 

Gly Gly He Asp Trp He Pro Gly Tyr Arg Ala Gin He Arg Arg Pro 
15 10 15 
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Ser Ser Pro Val Thr Ser His lie Ser Lys Val Leu Phe Gly Glu Asp 
20 25 30 

Asp Ser Leu Leu Arg Val Asp Cys lie Arg Tyr Asn Arg Ala Val Arg 
35 AO 45 

Asp Leu Gly Pro Val lie Ser Thr Gly Leu Leu His Leu Ala Glu Asp 
50 55 60 

Gly Val Leu Ser Pro Leu Ala Leu Thr Glu Gly Gly Lys Gly Ser Ser 
65 70 75 80 

Pro Ser Thr Arg Ser Ser Gin Gly Ser Gin Gly Ser Ser Gly Leu Glu 
85 90 95 

Glu Lys Glu Val Val Glu Val Thr Glu Ser Arg Asp Lys Pro Gly Leu 
100 105 110 

Asn Arg Ser Ser Glu Pro Leu Ser Gly Glu Lys Tyr Ser Pro Lys lie 
115 120 125 

Asp Asp Gin Arg Arg Thr His Asn Tyr Asp Glu Phe lie Cys Thr Phe 
130 135 HO 

lie Ser Met Leu Ala Gin Glu Gly Met Leu Ala Asn Leu Val Glu Gin 
145 150 155 160 

Asn lie Ser Val Arg Arg Arg Gin Gly Val Ser He Gly Arg Leu His 
165 170 175 

Lys Gin Arg Lys Pro Asp Arg Arg Met Ser Gly Arg 
180 185 



INFORMATION FOR SEQ ID N0:8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 161 amino acids 

(B) TYPE: amino acid 
CO STRANDED NESS: 

(D) TOPOLOGY: unknown 

<U> MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEO ID NO: 8: 

Gly Gly Me Asp Trp lie Pro Gly Val Arg Ala Gin He Arg Pro lie 
15 10 15 

Ser Ser Ser Ser Pro Ser Thr Arg Ser Ser Gin Gly Ser Gin Gly Ser 
20 25 30 

Ser Gly Leu Glu Glu Lys Glu Val Val Glu Val Thr Glu Ser Arg Asp 
35 40 45 

Lys Pro Gly Leu Asn Arg Ser Ser Glu Pro Leu Ser Gly Glu Lys Tyr 
50 55 60 

Ser Pro Lys Glu Leu Leu Ala Leu Leu Lys Cys Ala Glu Ala Glu He 
65 70 75 80 

Ala Asn Tyr Glu Ala Cys Leu Lys Glu Glu Val Glu Lys Arg Lys Lys 
85 90 95 
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Phe lys lie Asp Asp Gin Arg Arg Thr His Asn Tyr Asp Glu Phe lie 
100 105 110 

Cys Thr Phe lie Ser Net Leu Ala Gin Glu Gly Net Leu Ala Asn Leu 
115 120 125 

Val Glu Gin Asn He Ser Vat Arg Arg Arg Gin Gly Val Ser He Gly 
130 135 140 

Arg Leu His Lys Gin Arg Lys Pro Asp Arg Arg Lys Arg He Ser Gly 
H5 150 155 160 

Arg 



INFORMATION FOR SEQ ID N0:9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 149 amino acids 
<B> TYPE: amino acid 

(C) STRANDED NESS: 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 

Gly Gly He Asp Trp lie Pro Gly Tyr Arg Ala Gin He Arg Pro He 
15 10 15 

Ser Ser Gly Leu Glu Glu Lys Glu Val Val Glu Val Thr Glu Ser Arg 
20 25 30 

Asp Lys Pro Gly Leu Asn Arg Ser Ser Glu Pro Leu Ser Gly Glu Lys 
35 40 45 

Tyr Ser Pro Lys Glu Leu Leu Ala Leu Leu Lys Cys Val Glu Ala Glu 
50 55 60 

He Ala Asn Tyr Glu Ala Cys Leu Lys Glu Glu Val Glu Lys Arg Lys 
65 70 75 80 

Lys Phe Lys He Asp Asp Gin Arg Arg Thr His Asn Tyr Asp Glu Phe 
85 90 95 

He Cys Thr Phe He Ser Met Leu Ala Gin Glu Gly Met Leu Ala Asn 
100 105 110 

Leu Val Glu Gin Asn lie Ser Val Arg Arg Arg Gin Gly Val Ser He 
115 120 125 

Gly Arg Leu His Lys Gin Arg Lys Pro Asp Arg Arg Lys Arg Ser Glu 
130 135 140 

Arg Pro He Asp Arg 
145 
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(2) INFORMATION FOR SEQ ID NO:10: 

(f) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 60 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(11) MOLECULE TYPE: other nucleic acid 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 

ATCGACCTCT CTCCTCTGCG TGTTGAAGAA GTTCAAAACG . TTATCAACGC 

ATCCTGGAAT GTCCAATCTG 



(2) INFORMATION FOR SEQ ID NO: 11: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 101 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY : unknown 

(fi> MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:11: 



GGTTCAGCAG CTTCAGCATA CAGAACTTAC AGAAGATGTG GTCACACTTA 
GTTCCTTGAT CAGTTCCAGA CAGATTGGAC ATTCCAGGAT C 

(2) INFORMATION FOR SEQ ID NO: 12: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 102 base pairs 
(B> TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(H) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

GTATGCTGAA GCTGCTGAAC CAAAAGAAGG GTCCATCTCA ATGTCCACTG 



ACATCACTAA GCGTTCTCTG CAAGAATCTA CTCGTTTCTC TC 



(2) INFORMATION FOR SEQ ID N0:13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 81 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 



(ii) MOLECULE TYPE: other nucleic acid 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID M0:13: 

TTCCAGACCA GTGTCCAGCT GGAAAGCACA CATCATCTTC AGCAGTTCTT CAACCAGTTG 60 

AGAGAAACGA GTAGATTCTT G 61 

(2) INFORMATION FOR SEQ ID NO:H: 

(f) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS: single 
<D> TOPOLOGY: unknown 

(ff) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:K: 

GCTAGAATTC ACCATGGACC TGTCTGCTCT G 31 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 
<B) TYPE: nucleic acid 
(C) ST HANDEDNESS: single 
<D) TOPOLOGY: unknown 

(if) MOLECULE TYPE: other nucleic acid 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

GCTAGTCGAC TTCCAGACCA GTGTCCAG 28 

(2) INFORMATION FOR SEQ ID N0:16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 247 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D ) TOPOLOGY : unknown 

Of) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Gly Lys Lys He Met Thr Asp Ala Gly Ser Trp Cys Leu He Glu 
15 10 15 

Ser Asp Pro Gly Val Phe Thr Glu Met Leu Arg Gly Phe Gly Val Asp 
20 25 30 

Gly Leu Gin Vat Glu Glu Leu Tyr Ser Leu Asp Asp Asp Lys Ala Met 
35 40 45 

Thr Arg Pro Thr Tyr Gly Leu lie Phe Leu Phe Lys Trp Arg Gin Gly 
50 55 60 

Asp Glu Thr Thr Gly He Pro Ser Asp Lys Gin Asn lie Phe Phe Ala 
65 70 75 80 
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His Gin Thr lie Gin A&n Ala Cys Ala Thr Gin Ala Leu lie Asn Leu 
85 90 95 

Leu Net Asn Val Glu Asp Thr Asp Val Lys Leu Gly Asn He Leu Asn 
100 105 110 

Gin Tyr Lys Glu Phe Ala lie Asp Leu Asp Pro Asn Thr Arg Gly His 
115 120 125 

Cys Leu Ser Asn Ser Glu Glu lie Arg Thr Val His Asn Ser Phe Ser 
130 135 HO 

Arg Gin Thr Leu Phe Glu Leu Asp lie Lys Gly Gly Glu Ser Glu Asp 
145 150 155 160 

Asn Tyr His Phe Vat Thr Tyr Val Pro lie Gly Asn Lys Val Tyr Glu 
165 170 175 

Leu Asp Gly Leu Arg Glu Leu Pro Leu Glu Val Ala Glu Phe Gin Lys 
180 185 190 

Glu Gin Asp Trp He Glu Ala lie Lys Pro Val He Gin Gin Arg Net 
195 200 205 

Gin Lys Tyr Ser Glu Gly Glu He Thr Phe Asn Leu Net Ala Leu Val 
210 215 220 

Pro Asn Arg Lys Gin Lys Leu Gin Glu Net Met Glu Asn Leu He Gin 
225 230 235 240 

Ala Asn Glu Asn Asn Glu Leu 
245 



INFORMATION FOR SEO ID N0:17: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 223 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: unknown 

(11 > MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ 10 N0:17: 

Net Gin Leu Lys Pro Met Glu He Asn Pro Glu Met Leu Asn Lys Val 
1 5 10 15 

Leu Ser Arg Leu Gly Val Ala Gly Gin Trp Arg Phe Val Asp Val Leu 
20 25 30 

Gly Leu Glu Glu Glu Ser Leu Gly Ser Val Pro Ala Pro Ala Cys Ala 
35 40 45 

Leu Leu Leu Leu Phe Pro Leu Thr Ala Gin His Glu Asn Phe Arg Lys 
50 55 '60 

Lys Gin He Glu Glu Leu Lys Gly Gin Glu Val Ser Pro Lys Val Tyr 
65 70 75 80 

Phe Met Lys Gin Thr He Gly Asn Ser Cys Gly Thr He Gly Leu He 
85 90 95 
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His Ala Vat Ala Asn Asn Gin Asp Lys Lou Gly Phe Gtu Asp Gly Ser 
100 105 110 

Val Leu Lys Gin Phe Leu Ser Glu Thr Glu Lys Met Scr Pro Glu Asp 
115 120 125 

Arg Ala Lys Cys Phe Glu Lys Asn Glu Ala lie Gin Ala Ala His Asp 
130 135 HO 

Ala Val Ala Gin Glu Gly GLn Cys Arg Val Asp Asp Lys Val Asn Phe 
145 150 155 160 

His Phe He Leu Phe Asn Asn Val Asp Gly His Leu Tyr Glu Leu Asp 
165 170 175 

Gly Arg Met Pro Phe Pro Val Asn His Gly Ala Ser Scr Glu Asp Thr 
180 185 190 

Leu Leu Lys Asp Ala Ala Lys Val Cys Arg Glu Phe Thr Glu Arg Glu 
195 200 205 

Gin Gly Glu Val Arg Phe Ser Ala Val Ala Leu Cys Lys Ala Ala 
210 215 220 



INFORMATION FOR SEQ ID MOMS: 

(f) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 230 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS: 

<D) TOPOLOGY: unknown 

(ii> MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEO ID NO: 18: 

Met Glu Gly Gin Arg Trp Leu Pro Leu Glu Ala Asn Pro Glu Val Thr 
15 10 15 

Asn Gin Phe Leu Lys Gin Leu Gly Leu His Pro Asn Trp Gin Phe Val 
20 25 30 

Asp Vat Tyr Gly Met Asp Pro Gtu Leu Leu Ser Met Val Pro Arg Pro 
35 AO 45 

Val Cys Ala Val Leu Leu Leu Phe Pro lie Thr Glu Lys Tyr Glu Val 
50 55 60 

Phe Arg Thr Glu Glu Glu Glu Lys lie Lys Ser Gin Gly Gin Asp Vat 
65 70 75 80 

Thr Ser Ser Val Tyr Phe Met Lys Gin Thr He Ser Asn Ala Cys Gly 
85 90 95 

Thr lie Gly Leu He His Ala lie Ala Asn Asn Lys Asp Lys Met His 
100 105 110 

Phe Glu Ser Gly Ser Thr Leu Lys Lys Phe Leu Glu Glu Ser Val Ser 
115 120 125 

Met Ser Pro Glu Glu Arg Ala Arg Tyr Leu Glu Asn Tyr Asp Ala He 
130 135 HO 
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Arg Vol Thr His Glu Thr Ser Ale His Glu Gly Gin Thr Glu Ale Pro 
145 150 155 160 

Ser lie Asp Glu Lys Vol Asp Leu His Phe lie Ale Leu Vet His Vel 
165 170 175 

Asp Gly His Leu Tyr Glu Leu Asp Gly Arg Lys Pro Phe Pro I le Asn 
180 185 190 

His Gly Glu Thr Ser Asp Glu Thr Leu Leu Glu Asp Ale He Glu Vel 
195 200 205 

Cys Lys Lys Phe Met Glu Arg Asp Pro Asp Glu Leu Arg Phe Asn Ale 
210 215 220 

lie Ale Leu Ser Ale Ale 
225 230 

(2) INFORMATION FOR SEQ 10 NO:19: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: A3 amino ecids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(0) TOPOLOGY : unknown 

<H) MOLECULE TYPE: protein 

Cxi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Leu Leu Arg Cys Ser Arg Cys Thr Asn He Leu Arg Glu Pro Vel Cys 
15 10 15 

Leu Gly Gly Cys Glu His lie Phe Cys Ser Asn Cys Val Ser Asp Cys 
20 25 30 

He Gly Thr Gly Cys Pro Val Cys Tyr Thr Pro 
35 AO 



(2) INFORMATION FOR SEQ ID N0:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 326 amino acids 

<B> TYPE: . amino acid 

(C> STRANDEDNESS: 

(0) TOPOLOGY: unknown 

<H) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:20; 

Met Gly Lys Lys lie Met Thr Asp Ala Gly Ser Trp Cys Leu He Glu 
15 10 15 

Ser Asp Pro Gly Val Phe Thr Glu Met Leu Arg Gly Phe Gly Vel Asp 
20 25 30 

Gly Leu Gin Val Glu Glu Leu Tyr Ser Leu Asp Asp Asp Lys Ala Met 
35 AO A5 
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Thr ArQ 

50 



Pro Thr 



Tyr Gly Leu lie 
55 



69 

Phe Leu 



Phe Lys 
60 



Trp Arg Gin Gly 



Asp Glu 
65 



Thr Thr 



Gly lie Pro Ser 
70 



Asp Lys 



Gin Asn 
75 



He Phe Phe Ala 
80 



His Gin Thr He 



Leu Met 



Asn Vel 
100 



Gin Asn Ala Cys 
85 

Glu Asp Thr Asp 



Ala Thr Gin Ala leu lie Asn Leu 

90 95 

Val Lys Leu Gly Asn lie Leu Asn 

105 110 



Gin Tyr 



Cys Leu 
130 

Arg Gin 
145 



Lys Glu 
115 

Ser Asn 



Thr Leu 



Phe Ala He Asp 
120 

Ser Glu Glu He 
135 

Phe Glu Leu Asp 
150 



Leu Asp Pro Asn 
Arg Thr 
He Lys 



Val His 
140 



Gly Gly 
155 



Thr Arg Gly Hfs 
125 

Asn Ser Phe Ser 



Glu Ser Glu Asp 
160 



Asn Tyr His Phe 

Leu Asp Gly Leu 
180 



Val Thr Tyr Val 
165 

Arg Glu Leu Pro 



Pro He Gly Asn Lys Val Tyr Glu 

170 175 

Leu Glu Val Ala Glu Phe Gin Lys 

185 190 



Glu Gin Asp Trp He Glu Ala lie 

195 200 

Gin Lys Tyr Ser Glu Gly Glu He 

210 215 

Pro Asn Arg Lys Gin Lys Leu Gin 

225 230 

Ala Asn Glu Asn Asn Glu Leu Glu 
245 

Ala He Ala Asp Glu Asp Tyr Lys 
260 

Asn Arg Arg Arg His Asn Tyr Thr 

275 280 



Lys Pro Val He 
Thr Phe 



Glu Met 



Glu Gin 
250 

Met Glu 
265 



Asn Leu 
220 

Met Glu 
235 

He Ala 

Met Tyr 



Pro Phe Val He 



Gin Gin Arg Met 
205 

Met Ala Leu Val 



Asn Leu He Gin 
240 

Asp Leu Asn Lys 
255 

Arg Lys Glu Asn 
270 

Glu Leu Met Lys 
285 



He Leu 
290 

Gin Ala 
305 

Glu Leu 



Ala Lys 
Ala Lys 
Lys Arg 



Glu Gly Lys Leu Val Gly Leu Val 
295 300 

Glu Lys Ser Lys Leu Asn Thr Asp 
310 315 

Lys Gin 
325 



Asp Asn Ala Tyr 



He Thr Lys Leu 
320 



INFORMATION FOR SEO ID N0:21: 

<i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 227 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: unknown 
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<H> MOLECULE TYPE: protein 

<xi) SEQUENCE DESCRIPTION: SEO ID NO: 21: 

Met Leu Thr Trp Thr Pro Leu Glu Ser Asn Pro Glu Val Leu Thr lys 
15 10 15 

Tyr lie His Lys Leu Ala Val Ser Pro Ala Trp Ser Val Thr Asp Val 
20 25 30 

lie Gly Leu Glu Asp Asp Thr Leu Glu Trp He Pro Arg Pro Val Lys 
35 40 45 

Ale Phe He Leu Leu Phe Pro Cys Ser Glu Thr Tyr Glu Lys His Arg 
50 55 60 

Thr Glu Glu His Asp Arg lie Lys Glu Val Glu Glu Gin His Pro Glu 
65 70 75 80 

Asp Leu Phe Tyr Met Arg Gin Phe Thr His Asn Ala Cys Gly Thr Val 
85 90 95 

Ala Leu lie His Ser Val Ala Asn Asn Lys Glu Val Asp lie Asp Arg 
100 105 110 

Gly Val Leu Lys Asp Phe Leu Glu Lys Thr Ala Ser Leu Ser Pro Glu 
115 120 125 

Glu Arg Gly Arg Ala Leu Glu Lys Asp Glu Lys Phe Thr Ala Asp His 
130 135 HO 

Glu Ala Leu Ala Gin Glu Gly Gin Thr Asn Ala Ala Asn His Glu Lys 
145 150 155 160 

Val lie His His Phe He Ala Leu Val Asn Lys Glu Gly Thr Leu Tyr 
165 1 70 1 75 

Glu Leu Asp Gly Arg Lys Ser Phe Pro lie Lys His Gly Pro Thr Ser 
180 185 190 

Glu Glu Thr Phe Val Lys Asp Ala Ala Lys Val Cys Lys Glu Phe Met 
195 200 205 

Ala Arg Asp Pro Asn Glu Val Arg Phe Thr Val Leu Ala Leu Thr Ala 
210 215 220 

Ala Gin Gin 
225 



INFORMATION FOR SEQ ID NO:22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 236 amino acids 

<B> TYPE: amino acid 

(C) STRANDED NESS: 

<D> TOPOLOGY: unknown 



(ii) MOLECULE TYPE: protein 
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(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Net Ser Cly Glu Asn Arg Ala Val Val Pro Ha Clu Ser Asn Pro Glu 
15 10 15 

Vat Phe Thr Asn Phe Ala His Lys Leu Gly Leu Lys Asn Glu Trp Ala 
20 25 30 

Tyr Phe Asp lie Tyr Ser Leu Thr Glu Pro Glu Leu Leu Ala Phe Leu 
35 40 45 

Pro Arg Pro Val Lys Ala lie Vat Leu Leu Phe Pro lie Asn Glu Asp 
50 55 60 

Arg Lys Ser Ser Thr Ser Gin Gin lie Thr Ser Ser Tyr Asp Val He 
65 70 75 80 

Trp Phe Lys Gin Ser Val Lys Asn Ala Cys Gly Leu Tyr Ala lie Leu 
85 90 95 

His Ser Leu Ser Asn Asn Gin Ser Leu Leu Glu Pro Gly Ser Asp Leu 
100 105 110 

Asp Asn Phe Leu Lys Ser Gin Ser Asp Thr Ser Ser Ser Lys Asn Arg 
115 120 125 

Phe Asp Asp Val Thr Thr Asp Gin Phe Val Leu Asn Val He Lys Glu 
130 135 HO 

Asn Vat Gin Thr Phe Ser Thr Gly Gin Ser Clu Ala Pro Glu Ala Thr 
H5 150 155 160 

Ala Asp Thr Asn Leu His Tyr He Thr Tyr Val Glu Glu Asn Gly Gly 
165 170 175 

lie Phe Glu Leu Asp Gly Arg Asn Leu Ser Gly Pro Leu Tyr Leu Gly 
180 185 190 

Lys Ser Asp Pro Thr Ala Thr Asp Leu He Glu Gin Glu Leu Vat Arg 
195 200 205 

Val Arg Val Ala Ser Tyr Met Glu Asn Ala Asn Glu Glu Asp Val Leu 
210 215 220 

Asn Phe Ala Met Leu Gly Leu Gly Pro Asn Trp Glu 
225 230 235 



INFORMATION FOR SEO ID N0:23: 

(D SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 230 amino acids 

(B) TYPE: amino acid 

(C) STRAN0EDNESS: 

(0) TOPOLOGY: unknown 

<ii) MOLECULE TYPE: protein 

(xt) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

Met Glu Gly Gin Arg Trp Leu Pro Leu Glu Ala Asn Pro Glu Val Thr 
15 10 15 



WO 98/05968 



PCT/US97/13684 



72 

Asn Gin Phe Leu lys Gin Leu Gly leu His Pro Asn Trp Gin Phe Val 
20 25 30 

Asp Val Tyr Gly Met Asp Pro Glu Leu Leu Ser Met Val Pro Arg Pro 
35 40 45 

Val Cys Ala Val Leu Leu Leu Phe Pro He Thr Glu Lys Tyr Glu Val 
50 55 60 

Phe Arg Thr Glu Glu Glu Glu Lys lie Lys Ser Gin Gly Gin Asp Val 
65 70 75 80 

Thr Ser Ser Val Tyr Phe Met Lys Gin Thr lie Ser Asn Ala Cys Gly 
85 90 95 

Thr He Gly Leu He His Ala He Ala Asn Asn Lys Asp Lys Met His 
100 105 110 

Phe Glu Ser Gly Ser Thr Leu Lys Lys Phe Leu Glu Glu Ser Val Ser 
115 120 125 

Met Ser Pro Glu Glu Arg Ala Arg Tyr Leu Glu Asn Tyr Asp Ala He 
130 135 140 

Arg Val Thr His Glu Thr Ser Ala His Glu Gly Gin Thr Glu Ala Pro 
145 150 155 160 

Ser He Asp Glu Lys Val Asp Leu His Phe He Ala Leu Val His Val 
165 170 175 

Asp Gly His Leu Tyr Glu Leu Asp Gly Arg Lys Pro Phe Pro He Asn 
180 185 190 

His Gly Glu Thr Ser Asp Glu Thr Leu Leu Glu Asp Ala He Glu Val 
195 200 205 

Cys Lys Ly6 Phe Met Glu Arg Asp Pro Asp Glu Leu Arg Phe Asn Ala 
210 215 220 

He Ala Leu Ser Ala Ala 
225 230 



INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 223 amino acids 

(B> TYPE: amino acid 

(C) STRAND EDNESS: 

(0) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 

Met Gin Leu Lys Pro Met Glu He Asn Pro Glu Met Leu Asn Lys Val 
15 10 15 

Leu Ser Arg Leu Gly Val Ala Gly Gin Trp Arg Phe Val Asp Val Leu 
20 25 30 

Gly Leu Glu Glu Glu Ser Leu Gly Ser Val Pro Ala Pro Ala Cys Ala 
35 40 45 
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Leu Leu 
50 

Lys Gin 
65 

Phe Net 

His Ala 

Val Leu 

Arg Ala 
130 

Ala Val 
145 

His Phe 

Gly Arg 

Leu Leu 



Gin Gly 
210 



Leu Leu Phe Pro' 



lie Glu 



Lys Gin 



Val Ala 
100 

Lys Gin 
115 



Glu Leu 
70 

Thr He 
85 

Asn Asn 



Phe Leu 



Lys Cys Phe Glu 



Ala Gin 



Me Leu 



Glu Gly 
150 

Phe Asn 
165 



Phe Pro 
Ala Ala 
Glu Val Arg Phe 



Net Pro 
180 

Lys Asp 
195 



Leu Thr Ala 
55 

Lys Gly Gin 

Gly Asn Ser 

Gin Asp Lys 
105 

Ser Glu Thr 
120 

Lys Asn Glu 
135 

Gin Cys Arg 

Asn Val Asp 



Val Asn His 
185 

Lys Val Cys 
200 

Ser Ala Val 
215 



Gin His Glu 
60 

Glu Val Ser 
75 

Cys Gly Thr 
90 

Leu Gly Phe 

Glu Lys Met 

Ala lie Gin 
HO 

Val Asp Asp 
155 

Gly His Leu 
170 

Gly Ala Ser 

Arg Glu Phe 



Ala Leu Cys 
220 



Asn Phe Arg Lys 



Pro Lys Val Tyr 
80 

He Gly Leu He 
95 

Glu Asp Gly Ser 
110 

ser Pro Glu Asp 
125 

Ala Ala His Asp 



Lys Val Asn Phe 
160 

Tyr Glu Leu Asp 
175 

Ser Glu Asp Thr 
190 

Thr Glu Arg Glu 
205 

Lys Ala Ala 



(2) INFORMATION FOR SEO 10 NO:25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS: single 
(0) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: other nucleic acid 

(Xi) SEQUENCE DESCRIPTION: SEO ID NO: 25: 

CCATCTCAAG GTCCACTGTG TAAG 24 



(2) INFORMATION FOR SEQ ID N0:26: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: unknown 

<ii) MOLECULE TYPE: other nucleic acid 

(iv) ANT I -SENSE: YES 
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(Xi) SEQUENCE DESCRIPTION: SEO ID N0:26: 

CTTACACAGT GGACCTTGAG ATGG 24 

(2) INFORMATION FOR SEQ ID NO:27: 

<f> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(11) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

CAATGTCCAC TGGGTAAGAA CGACATC 27 

(2) INFORMATION FOR SEQ ID NO:28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(if) MOLECULE TYPE: other nucleic acid 

<1v) ANTI-SENSE: YES 

(Xf) SEQUENCE DESCRIPTION: SEQ ID NO:28: 

GATGTCGTTC TTACCCAGTG 6ACATTG 27 

(2) INFORMATION FOR SEQ ID N0:29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 79 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

<ii) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 

GCATGGATCC TCAAACCTTG TGCAGGCAGG TACCCTGGTC AACAGGAGAC AGGTGGGAAA 60 

CCAGGATCTT TTGCATAGC 79 

(2> INFORMATION FOR SEO ID N0:30: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 23 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: unknown 
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(ii) MOLECULE TYPE: other nucleic acid 
(xl) SEQUENCE DESCRIPTION: SEO ID N0:30: 

CCGAT6CCCT TGGAATTGAC GAG 

(2) INFORMATION FOR SEO 10 N0:31 : 

<f> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: unknown 

(it) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:31: 

CGATGAATTC GAGCTAGCTT CTATC 

(2) INFORMATION FOR SEQ ID N0:32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 
<B) TYPE: nucleic acid 
<C> STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

<ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

GCATGAATTC TCAGCTCCGG CGCACTGAGA TG 



(2) INFORMATION FOR SEQ ID N0:33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: unknown 

(it) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

GCATGAATTC TCAAGCCAGC ATGGATATGA AGG 



(2) INFORMATION FOR SEQ ID N0:34: 

(f) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 
(8) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: other nucleic acid 
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CxO SEQUENCE DESCRIPTION: SEO ID N0:34: 

GCATGAATTC TCAGTCATCA ATCTTCAACT TC 



(2> INFORMATION FOR SEO ID NO:35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 
<B) TYPE: nucleic acid 

(C) STRANOEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: other nucleic acid 
(xl) SEQUENCE DESCRIPTION: SEQ ID NO:35: 

GCATGAATTC TCATGCAATC TCGGCTTCTA C 



(2) INFORMATION FOR SEQ ID NO:36: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

GCATGGATCC CCAAGATTGA TGACCAGCGA AGG 



(2) INFORMATION FOR SEQ ID NO:37: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(11) MOLECULE TYPE: other nucleic acid 
(Xl) SEQUENCE DESCRIPTION: SEQ ID NO:37: 

GCTGGCCAAC CCGGTGGAAC AG 



(2) INFORMATION FOR SEQ ID NO:38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: other nucleic acid 



(iv> ANTI -SENSE: YES 
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(Xi) SEQUENCE DESCRIPTION: SEO 10 110:38: 

CTGTTCCACC GGGTTGGCCA GC 22 



(2) INFORMATION FOR SEQ ID N0:39: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 56 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: unknown 

(II) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 

CCTGTTATTA ACCCTCACTA AAGGGAAGGG TACCATGAAT AAGGGCTGGC TGGAGC 56 



(2) INFORMATION FOR SEQ ID N0:40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
(D> TOPOLOGY: unknown 

(it) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEO ID HQ-AO: 

GAAGCGGATG TCGTGGTAGG 20 



(2) INFORMATION FOR SEQ ID N0:41: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
(D> TOPOLOGY: unknown 

<H) MOLECULE TYPE: other nucleic acid 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 

GATGTATATA ACTATCTATT CG 22 



(2) INFORMATION FOR SEQ ID NO:42: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(II) MOLECULE TYPE: other nucleic acid 



WO 98/05968 



78 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 

GCATACATCT TCACCCCTGG CTGCCTTGGA TTGG 



(2) INFORMATION FOR SEQ 10 N0:43: 



(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: unknown 



(M) MOLECULE TYPE: other nucleic acid 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:43: 



GAAGCGGATG TCGTGGTAGG 



(2) INFORMATION FOR SEQ ID NO:44: 



CD SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 
(B> TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

<ii> MOLECULE TYPE: other nucleic acid 
(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:44: 

GATGTATATA ACTATCTATT CG 

<2> INFORMATION FOR SEQ ID NO:45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

(11) MOLECULE TYPE: other nucleic acid 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 

CGTAGTCGAC TGTCAGCGCC AGGGGACTC 



(2> INFORMATION FOR SEQ ID N0:46: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 



<H) MOLECULE TYPE: other nucleic acid 
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<X1> SEQUENCE DESCRIPTION: SEQ 10 N0:A6: 

CAACCCCACT CCCATTGTC 19 



(2) INFORMATION FOR SEQ ID NO:47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: unknown 

<ii) MOLECULE TYPE: other nucleic acid 
<iv) ANTI-SENSE: YES 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 



GAGTTGGTGT 



TCTGCACGTC 



20 
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WHAT IS CLAIMED IS: 

1. A nucleic acid sequence encoding mammalian 
BRCA1 Associated Protein (BAP-1) or a fragment thereof, 
isolated from cellular materials with which it is 
naturally associated. 

2. The nucleic acid sequence according to claim 1 
which is selected from the group consisting of: 

(a) SEQ ID NO:l? 

(b) a sequence which hybridizes to (a) under 
stringent conditions; 

(c) an allelic variant of (a) or (b) ; and 

(d) a fragment of (a) or (b) . 

3. The sequence according to claim 2 wherein said 
fragment is selected from the group consisting of: 

(a) open reading frame, nucleotides about 40 
to about 2226 of SEQ ID N0:1; 

(b) a region of acidity, nucleotides about 
1225 to about 1263 of SEQ ID NO:l; and 

(c) interactive domain, nucleotides about 1831 
to about 2226 of SEQ ID N0:1. 

4. The sequence according to claim 1 which encodes 
human BAP-1 SEQ ID NO: 2 or a fragment thereof. 

5. A mammalian BRCA1 associated protein (BAP-1) or 
a peptide fragment thereof. 

6. The BAP-1 protein according to claim 5, said 
protein comprising an amino acid sequence selected from 
the group consisting of: 

(a) human BAP-1, SEQ ID NO: 2; 

(b) fragment of (a) ; 
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(c) analogues of (a) characterized by having 
at least about 85% homology with SEQ ID NO: 2; and 

(d) homo logs of (a) characterized by having at 
least about 85% homology with SEQ ID NO: 2. 

7. The BAP-1 protein according to claim 5, wherein 
said fragment is selected from the group consisting of: 

(a) amino acids about 656 to about 661 of SEQ 

ID NO: 2; 

(b) amino acids about 717 to about 722 of SEQ 

ID N0:2; 

(c) amino acids about 396 to about 408 SEQ ID 

NO: 2, 

(d) amino acids about 598 to about 729 of SEQ 

ID NO: 2; 

(e) amino acids about 483 to about 576 of SEQ 

ID NO: 2; 

(f) amino acids about 1 to about 214 of SEQ ID 

NO: 2; 

(g) amino acids about 1 to about 426 of SEQ ID 

NO: 2; 

(h) amino acids about 1 to about 352 of SEQ ID 

NO: 2; 

(i) amino acids about 1 to about 325 of SEQ ID 

NO: 2; 

(j) amino acids about 1 to about 313 of SEQ ID 

NO: 2; and 

(k) smaller fragments of (a) - (j) comprising 
about 8 amino acids, 

8. A vector comprising a mammalian nucleic acid 
sequence encoding a BRCA1 associated protein (BAP-1) or 
peptide under the control of suitable regulatory 
sequences. 
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9. The vector according to claim 8, wherein said 
vector is a gene therapy vector. 

10. A host cell transformed with the vector 
according to claim 8. 

11. A method of recombinantly expressing BRCA1 
associated protein (BAP-1) by culturing a recombinant 
host cell transformed with nucleic acid sequence encoding 
BAP-1 under conditions which permit expression of BAP-1. 

12. A diagnostic reagent comprising a nucleic acid 
sequence selected from the group consisting of: 

(a) SEQ ID NO:l and its complementary 

sequence; 

(b) a nucleotide sequence encoding amino acids 
598 to 729 of SEQ ID NO: 2 and its complementary 
sequence ; 

(c) a nucleic acid fragment of (a) or (b) 
comprising at least 15 nucleotides in length; 

(d) a sequence which hybridizes to (a), (b) or 
(c) under stringent conditions; 

and a detectable label which is associated with 
said sequence. 

13. An anti-BRCAl associated protein (BAP-1) 
antibody. 

14. The antibody according to claim 13, wherein 
said antibody binds to a peptide selected from the group 
consisting of 

(a) SEQ ID NO: 2; 

(b) amino acids 483 to 576 of SEQ ID NO: 2; 

(c) amino acids 598 to 729 of SEQ ID NO: 2; 
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and 

(d) a fragment of (a) , (b) or (c) comprising 
about 8 amino acids. 

15. The antibody according to claim 13, selected 
from the group consisting of a chimeric antibody, a 
humanized antibody, a monoclonal antibody and a 
polyclonal antibody. 

16. An anti-idiotype antibody specific for the 
antibody of claim 13. 

17. A diagnostic reagent comprising the antibody 
according to claim 13 and a detectable label. 

18. A method of detecting a cancer involving BRCA1 
comprising providing a biopsy sample from a patient 
suspected of having said cancer and incubating said 
sample in the presence of a diagnostic reagent according 
to claim 12 or 17. 

19. A method of detecting a deficiency in BRCA1 
associated protein (BAP-1) in a patient comprising 
providing a sample from a patient suspected of having 
said deficiency and performing the polymerase chain 
reaction using the diagnostic reagent according to claim 
12. 

20. A method of identifying compounds which 
specifically bind to BAP-1 or a fragment thereof, 
comprising the steps of contacting said BAP-1 or fragment 
with a test compound to permit binding of the test 
compound to BAP-1; and determining the amount of test 
compound which is bound to BAP-1. 
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21. The method according to claim 20 wherein said 
BAP-1 is immobilized on a solid support. 



22. A compound identified by the method of claim 

20. 
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FIGURE 3A 



GGCAC6A6GC ATGGCGCTGA GGGGCCGCCC CGCGGGAAG ATG AAT 45 

Net Asn 
1 

AAG GGC TGG CTG GAG CTG GAG AGC GAC CCA GGC CTC TTC ACC CTG 90 
Lys Gly Trp Leu Glu Leu Glu Ser Asp Pro Gly Leu Phe Thr Leu 
5 10 15 

CTC GTG GAA GAT TTC GGT GTC AAG GGG GTG CAA GTG GAG GAG ATC 135 
Leu Val Glu Asp Phe Gly Val Lys Gly Val Gin Val Glu Glu He 
20 25 30 

TAC GAC CTT CAG AGC AAA TGT CAG GGC CCT GTA TAT GGA TTT ATC 180 
Tyr Asp Leu Gin Ser Lys Cys Gin Gly Pro Val Tyr Gly Phe He 
35 40 45 

TTC CTG TTC AAA TGG ATC GAA GAG CGC CGG TCC CGG CGA AAG GTC 225 
Phe Leu Phe Lys Trp He Glu Glu Arg Arg Ser Arg Arg Lys Val 
50 55 60 

TCT ACC TTG GTG GAT GAT ACG TCC GTG ATT GAT GAT GAT ATT GTG 270 
Ser Thr Leu Val Asp Asp Thr Ser Val He Asp Asp Asp He Val 
65 70 75 

AAT AAC ATG TTC TTT GCC CAC CAG CTG ATA CCC AAC TCT TGT GCA 315 
Asn Asn Met Phe Phe Ala His fcln) Leu He Pro Asn Ser fcys)Ala 
80 V -H5 90 V — y 

ACT CAT GCC TTG CTG AGC GTG CTC CTG AAC TGC AGC AGC GTG GAC 360 
Thr His Ala Leu Leu Ser Val Leu Leu Asn Cys Ser Ser Val Asp 
95 100 105 

CTG GGA CCC ACC CTG AGT CGC ATG AAG GAC TTC ACC AAG GGT TTC 405 
Leu Gly Pro Thr Leu Ser Arg Met Lys Asp Phe Thr Lys Gly Phe 
110 115 120 

AGC CCT GAG AGC AAA GGA TAT GCG ATT GGC AAT GCC CCG GAG TTG 450 
Ser Pro Glu Ser Lys Gly Tyr Ala He Gly Asn Ala Pro Glu Leu 
125 130 135 

GCC AAG GCC CAT AAT AGC CAT GCC AGG CCC GAG CCA CGC CAC CTC 495 
Ala Lys Ala His Asn Ser His Ala Arg Pro Glu Pro Arg His Leu 
140 145 150 

CCT GAG AAG CAG AAT GGC CTT AGT GCA GTG CGG ACC ATG GAG GCG 540 
Pro Glu Lys Gin Asn Gly Leu Ser Ala Val Arg Thr Met Glu Ala 
155 160 165 

TTC C^C TTT GTC AGC TAT GTG CCT ATC ACA GGC CGG CTC TTT GAG 585 
Phe Phe Val Ser Tyr Val Pro He Thr Gly Arg Leu Phe Glu 
170 175 180 
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FIGURE 3B 

CTG GAT GGG CTG AAG GTC TAC CCC ATT GAC CAT GGG CCC TGG GGG 630 
Leu <As{5) Gly Leu Lys Val Tyr Pro lie Asp His Gly Pro Trp Gly 
185 190 195 

GAG GAC GAG GAG TGG ACA GAC AAG GCC CGG CGG GTC ATC ATG GAG 675 
Glu Asp Glu Glu Trp Thr Asp Lys Ala Arg Arg Val lie Met Glu 
200 205 210 

CGT ATC GGC CTC GCC ACT GCA GGG GAG CCC TAC CAC GAC ATC CGC 720 
Arg lie Gly Leu Ala Thr Ala Gly Glu Pro Tyr His Asp lie Arg 
215 220 225 

TTC AAC CTG ATG GCA GTG GTG CCC GAC CGC AGG ATC AAG TAT GAG 765 
Phe Asn Leu Met Ala Val Val Pro Asp Arg Arg lie Lys Tyr Glu 
230 235 240 

GCC AGG CTG CAT GTG CTG AAG GTG AAC CGT CAG ACA GTA CTA GAG 810 
Ala Arg Leu His Val Leu Lys Val Asn Arg Gin Thr Val Leu Glu 
245 250 255 

GCT CTG CAG CAG CTG ATA AGA GTA ACA CAG CCA GAG CTG ATT CAG 855 
Ala Leu Gin Gin Leu lie Arg Val Thr Gin Pro Glu Leu lie Gin 
260 265 270 

ACC CAC AAG TCT CAA GAG TCA CAG CTG CCT GAG GAG TCC AAG TCA 900 
Thr His Lys Ser Gin Glu Ser Gin Leu Pro Glu Glu Ser Lys Ser 
275 280 285 

GCC AGC AAC AAG TCC CCG CTG GTG CTG GAA GCA AAC AGG GCC CCT 945 
Ala Ser Asn Lys Ser Pro Leu Val Leu Glu Ala Asn Arg Ala Pro 
290 295 300 

GCA GCC TCT GAG GGC AAC CAC ACA GAT GGT GCA GAG GAG GCG GCT 990 
Ala Ala Ser Glu Gly Asn His Thr Asp Gly Ala Glu Glu Ala Ala 
305 310 315 

GGT TCA TGC GCA CAA GCC CCA TCC CAC AGC CCT CCC AAC AAA CCC 1035 
Gly Ser Cys Ala Gin Ala Pro Ser His Ser Pro Pro Asn Lys Pro 
320 325 330 

AAG CTA GTG GTG AAG CCT CCA GGC AGC AGC CTC AAT GGG GTT CAC 1080 
Lys Leu Val Val Lys Pro Pro Gly Ser Ser Leu Asn Gly Val His 
335 340 345 

CCC AAC CCC ACT CCC ATT GTC CAG CGG CTG CCG GCC TTT CTA GAC 1125 
Pro Asn Pro Thr Pro lie Val Gin Arg Leu Pro Ala Phe Leu Asp 
350 355 360 



AAT CAC AAT TAT GCC AAG TCC CCC ATG CAG GAG GAA GAA GAC CTG 
Asn His Asn Tyr Ala Lys Ser Pro Met Gin Glu Glu Glu Asp Leu 
365 370 375 



1170 
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FIGURE 3C 

6CG GCA GGT GTG GGC CGC AGC CGA GTT CCA GTC CGC CCA CCC GAG 1215 
Ala Ala Gly Val Gly Arg Ser Arg Val Pro Val Arg Pro Pro Gin 
380 385 390 

CAG TAC TCA GAT GAT GAG GAT GAC TAT GAG GAT GAC GAG GAG GAT 1260 

Gin Tyr Ser ftgp ftgp Slfl APP A8P T\T <*1U A9P ASP QlV GlU ASP 
395 400 405 

fi&C GTG CAG AAC ACC AAC TCT GCC CTT AGG TAT AAG GGG AAG GGA 1305 
Asp! Val Gin Asn Thr Asn Ser Ala Leu Arg Tyr Lys Gly Lys Gly 
410 415 420 

ACA GGG AAG CCA GGG GCA TTG AGC GGT TCT GCT GAT GGG CAA CTG 1350 
Thr Gly Lys Pro Gly Ala Leu Ser Gly Ser Ala Asp Gly Gin Leu 
425 430 435 

TCA GTG CTG CAG CCC AAC ACC ATC AAC GTC TTG GCT GAG AAG CTC 1395 
Ser Val Leu Gin Pro Asn Thr He Asn Val Leu Ala Glu Lys Leu 
440 445 450 

AAA GAG TCC CAG AAG GAC CTC TCA ATT CCT CTG TCC ATC AAG ACT 1440 
Lys Glu Ser Gin Lys Asp Leu Ser He Pro Leu Ser He Lys Thr 
455 460 465 

AGC AGC GGG GCT GGG AGT CCG GCT GTG GCA GTG CCC ACA CAC TCG 1485 
Ser Ser Gly Ala Gly Ser Pro Ala Val Ala Val Pro Thr His Ser 
470 475 480 

C&S_CCC TCA CCC ACC CCC AGC AAT GAG AGT ACA GAC ACG GCC TCT 1530 
E3J0_Pro Ser Pro Thr Pro Ser Asn Glu Ser Thr Asp Thr Ala Ser 
485 490 495 

GAG ATC GGC AGT GCT TTC AAC TCG CCA CTG CGC TCG CCT ATC CGC 1575 
Glu He Gly Ser Ala Phe Asn Ser Pro Leu Arg Ser Pro He Arg 
500 505 510 

TCA GCC AAC CCG ACG CGG CCC TCC AGC CCT GTC ACC TCC CAC ATC 1620 
Ser Ala Asn Pro Thr Arg Pro Ser Ser Pro Val Thr Ser His He 
515 520 525 

TCC AAG GTG CTT TTT GGA GAG GAT GAC AGC CTG CTG CGT GTT GAC 1665 
Ser Lys Val Leu Phe Gly Glu Asp Asp Ser Leu Leu Arg Val Asp 
530 535 540 

TGC ATA CGC TAC AAC CGT GCT GTC CGT GAT CTG GGT CCT GTC ATC 1710 
Cys He Arg Tyr Asn Arg Ala Val Arg Asp Leu Gly Pro Val He 
545 550 555 

AGC ACA GGC CTG CTG CAC CTG GCT GAG GAT GGG GTG CTG AGT CCC 1755 
Ser Thr Gly Leu Leu His Leu Ala Glu Asp Gly Val Leu Ser Pro 
560 565 570 
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FIGURE 3D 



CTG GCG CTG ACA GAG GGT GGG AAG GGT TCC TCG CCC TCC ATC AGA 1800 
Leu Ala Leu_XhE Glu Gly Gly Lys Gly Ser Ser Pro Ser lie Arg 
575 580 585 



CCA ATC CAA GGC AGC CAG GGG TCC AGC AGC 
Pro lie Gin Gly Ser Gin Gly Ser Ser Ser 
590 595 



CCA GTG GAG AAG GAG 
Pro Val Glu Lys Glu 
600 



GTC GTG GAA GCC ACG GAC AGC AGA GAG AAG ACG GGG ATG GTG AGG 
Val Val Glu Ala Thr Asp Ser Arg Glu Lys Thr Gly Met Val Arg 
605 610 615 

CCT GGC GAG CCC TTG AGT GGG GAG AAA TAC TCA CCC AAG GAG CTG 
Pro Gly Glu Pro Leu Ser Gly Glu Lys Tyr Ser Pro Lys Glu Leu 
620 625 630 

CTG GCA CTG CTG AAG TGT GTG GAG GCT GAG ATT GCA AAC TAT GAG 
Leu Ala Leu Leu Lys Cys Val Glu Ala Glu lie Ala Asn Tyr Glu 
635 640 645 

GCG TGC CTC AAG GAG GAG GTA GAG AAG AGG AAG AAG TTC AAG ATT 
Ala Cys Leu Lys Glu Glu Val Glu Lvs Arc Lvs Lvs Phe Lvs lie 
650 655 660 

GAT GAC CAG AGA AGG ACC CAC AAC TAC GAT GAG TTC ATC TGC ACC 
Asp Asp Gin Arg Arg Thr His Asn Tyr Asp Glu Phe lie Cys Thr 
665 670 675 

TTT ATC TCC ATG CTG GCT CAG GAA GGC ATG CTG GCC AAC CTA GTG 
Phe He Ser Met Leu Ala Gin Glu Gly Met Leu Ala Asn Leu Val 
680 685 690 

GAG CAG AAC ATC TCC GTG CGG CGG CGC CAA GGG GTC AGC ATC GGC 
Glu Gin Asn He Ser Val Arg Arg Arg Gin Gly Val Ser He Gly 
695 700 705 

CGG CTC CAC AAG CAG CGG AAG CCT GAC CGG CGG AAA CGC TCT CGC 
Arg Leu His Lys Gin Arg Lys Pro Asp Arg Arg Lys Arg Ser Ara 
710 715 720 



CCC TAC AAG GCC AAG CGC CAG 
Pro Tyr Lys Ala Lys Arg Gin 
725 



1845 



1890 



1935 



1980 



2025 



2070 



2115 



2160 



2205 



TGAGGACTGC TGGCCCTGAC TCTGCAGCCC 2256 



ACTCTTGCCG TGTGGCCCTC ACCAGGGTCC TTCCCTGCCC CACTTCCCCT 



2306 



TTTCCCAGTA TTACTGAATA GTCCCAGCTG GAGAGTCCAG GCCCTGGGAA 2356 

TGGGAGGAAC CAGGCCACAT TCCTTCCATC GTGCCCTGAG GCCTGACACG 2406 

GCAGATCAGC CCCATAGTGC TCAGGAGGCA GCATCTGGAG TTGGGGCACA 2456 

GCGAGGTACT GCAGCTTCCT CCACAGCCGG CTGTGGAGCA GCAGGACCTG 2506 
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FIGURE 3E 
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